1. We have added a Gift Upgrades feature that allows you to gift an account upgrade to another member, just in time for the holiday season. You can see the gift option when going to the Account Upgrades screen, or on any user profile screen.
    Dismiss Notice

[We the People] Development "diary"

Discussion in 'Civ4Col - We The People' started by Nightinggale, Nov 23, 2018.

  1. Nightinggale

    Nightinggale Chieftain Supporter

    Joined:
    Feb 2, 2009
    Messages:
    3,986
    There was a request for knowing what goes on with development. We don't have a blog, twitter or anything like that, mainly because we do not have anybody to maintain something like that. We are still recruiting more people, which includes people for any task for the mod itself as well as publicity like building a web site.

    For the time being I decided to add this thread to give hints on what is going on regarding development. It's in no way complete and there is way more information available in the git log and the issue system on git. This is just highlights of major events.

    First entry: translation support

    To put it mildly: the vanilla translation support sucks big time. It allows switching the language used in the game, but that's about it for the good part. It doesn't store which language you picked, but rather the index of that language. When reading the text from xml, it will read the language at that index. The setting is stored in CivilizationIV.ini (my documents) and it's called Language.

    One huge flaw in this system is that if you select language 0, you get language 0 in all mods. This has worked out ok so far because the first 5 languages are set by vanilla, hence the same in all mods. However imagine when we add more languages. We could end up with say Russian as language 6 in one mod and 7 in another mod. Also if we load language 7 in a mod with just 5 languages, all text will vanish.

    Another flaw is that adding a new string (TXT_KEY) requires adding it in all languages and the person adding new text usually doesn't speak all the languages well enough to do the translation work as well as programming. This has resulted in adding English text in non-English languages just because the game requires something to be added.

    With the talk of adding Brazilian Portuguese (and now apparently Russian too), I started working on a complete rewrite of how translations work.

    What I have done

    Since most of the code is hidden inside the exe file (can't get modded), the languages still need to be stored as a number in CivilizationIV.ini. The problem is not the number itself, but rather the issue that all mods needs to agree on which language to use for each number. I added a table for this and added an additional 20 languages to it (to a total of 25). If all mods agree on using this table, the selected language will be kept even when switching mod.

    Getting a list of 25 translations is no good, particularly if there is only 2-3 real translations. I added a new xml file, which tells which languages to add to the language menu. This means it's perfectly fine to use language 0, 2, 17 only and they will show up as 3 languages. This allows mods to agree on which language is used for each number while maintaining the ability to not have the same list ingame. One mod can add a new translation while another mod doesn't. It also allows translations to appear at whatever order the xml modder wants regardless of the numbers for the languages.

    If a translation is missing for a certain TXT_KEY or even missing entirely in the mod, English will be used. This avoids the issue of partial or missing translations results in missing text and it also removes the need to add translations for all languages when adding a new TXT_KEY. This makes it much easier to both add new TXT_KEYs and work on or use partial translations.

    Translations for a TXT_KEY can appear in any order as they are located by tag name rather than index. This means it's possible to add English, German and nothing else. Vanilla wants the second language to be French, but now French is missing (not translated) and people using the French translation will get the English "translation". Vanilla's reaction to a TXT_KEY with just English and German is to read German as French due to only using indexes, not tag names.

    An interesting twist to using tag names is it's possible to use the "language" Tag. This will make the text ingame appear as the TXT_KEYs. Often useless, but if you spot say a typo ingame, you can switch and get which key you should look up in the text xml files.

    Anybody interested in the code can read it here: https://github.com/We-the-People-civ4col-mod/Mod/commit/560a80b37e9305408f1dd3d44c265975d29ae54a
    It should be noted that it should be possible to copy it to other mods without issues as it doesn't rely on any mod specific code. In fact it doesn't rely on Colonization specific code, meaning it should work in BTS too.
     
  2. Nightinggale

    Nightinggale Chieftain Supporter

    Joined:
    Feb 2, 2009
    Messages:
    3,986
    Last entry for translation support: UTF-8 support

    There is a character encoding issue with vanilla text xml files. Only ASCII characters are ensured to stay the same because of the mess with character codepages. This means we need to escape characters like:
    Code:
        <TEXT>
            <Tag>TXT_KEY_UNITS</Tag>
            <English>Units</English>
            <French>Unit&#233;s</French>
            <German>Einheiten</German>
            <Italian>Unit&#224;</Italian>
            <Spanish>Unidades</Spanish>
        </TEXT>
    Those &# + some number + ; represent characters. Now the same with the escaped characters decoded:
    Code:
        <TEXT>
            <Tag>TXT_KEY_UNITS</Tag>
            <English>Units</English>
            <French>Unités</French>
            <German>Einheiten</German>
            <Italian>Unità</Italian>
            <Spanish>Unidades</Spanish>
        </TEXT>
    Obviously not having to use html escaped encoded text is way more read friendly. This is where UTF-8 comes into play.

    I changed the reading code so it can read UTF-8 and then convert it to the codepage used by the language in question. This means much easier access for translators and we have greatly reduced the risk of mistakes, both due to reading issues and due to translators of multiple languages messing up the file encoding because they aren't using the same locale.

    In order to avoid doing the task of converting all the files in one go, I made the conversion activate if an xml file contains "utf8" (not case sensitive). If this isn't in the filename, then the vanilla approach will be used. The game doesn't have any issue with mixed content as in some files are UTF-8 encoded and some aren't.

    The actual text ingame is still relying on codepages. No change there. It will use the locale set in windows, which you can change by using this guide. However the game will try to change the locale for the game itself (works on some computers and not others) and often the player needs to use the locale the computer is already set to use. For instance using the French or German translations requires a western European language as locale while Russian requires Russian locale. Often this is the default in windows, meaning it works out of the box. The big issue is if the computer think it's Russian and the player wants to use say German. The guide in the link should be able to fix it, but it's a part of windows, which is annoying and part of why using codepages is not recommended anymore.

    The code now supports translation in any of the single byte codepages. Currently I added conversion tables for western European, eastern European and Cyrillic. Technically this is adding support for codepages 1250-1252. It's possible to add support for codepages 1253-1258 too, which adds Hebrew, Greek and whatever else you can find there. I haven't added them because I can always do it later if needed.

    I don't think I will code more translation support for the time being. I can't think of anything to add, which isn't a major task in itself, like hack in unicode support (which btw would require access to the Japanese exe file, something I don't have).

    The code change. Like the previous one, this isn't mod specific and should work in any mod and in BTS. You can also get the entire change combined.
     
    Last edited: Nov 23, 2018
    devolution and Thomazml like this.
  3. eXPonent_123

    eXPonent_123 Chieftain

    Joined:
    Mar 25, 2017
    Messages:
    105
    Gender:
    Male
    We can not avoid such characters, as the game has no normal support for the Russian language.
    In ISO-8859-1, characters are replaced with question marks "?".
    And in UTF-8 characters are replaced by French counterparts (symbols with accent grave).

    We use this table to encode characters:
     
    Last edited: Nov 24, 2018
  4. Nightinggale

    Nightinggale Chieftain Supporter

    Joined:
    Feb 2, 2009
    Messages:
    3,986
    The game supports Russian if the system locale is set to Russian or any other language using codepage 1251. There is an issue with GameFont, but we can do something about that. Worst case scenario is to make a new GameFont file and make the game pick the right one at startup depending on system locale. There is support for any 8 bit codepage.

    I changed the text loading code to use tags rather than indexes to avoid that problem. Also I added actual text encoding rather than the vanilla approach, which is to assume the value is the same. That approach only works for western European (codepage 1252).

    Interesting, but it's a dead link. It's most likely in your browser's cache or your are logged in or something.
     
  5. Nightinggale

    Nightinggale Chieftain Supporter

    Joined:
    Feb 2, 2009
    Messages:
    3,986
    Here is an example. I added this to an UTF-8 encoded text file.
    Code:
        <TEXT>
            <Tag>TXT_KEY_MAIN_MENU_SINGLE_PLAYER</Tag>
            <English>русские</English>
        </TEXT>
    Next I hacked the DLL to think English is using codepage 1251, meaning it will use the same character encoding as Russian. In other words I use Russian character encoding ingame despite not having a Russian translation. I then switched my computer's locale to Russian, started the game and I added the result as a screenshot here:
    Russian screenshot.jpg

    Looks just fine to me.
     
  6. eXPonent_123

    eXPonent_123 Chieftain

    Joined:
    Mar 25, 2017
    Messages:
    105
    Gender:
    Male
    Last edited: Nov 24, 2018

Share This Page