Translation into other Languages

For my french translation, i use git.
to update a file -> i watch a modification in the english file, i do the same in the french file.

If you succeed at making your generator, i will be happy to use it.

edit: i have just see CoreText_en_US.xml, and i though that the cbp text files are big...
 
I vote for a single spreadsheet with locales, let me call it locales.csv for now, for every module. I.e. Community Patch/CoreText_locales.csv, CBP/CoreText_locales.csv, ...
First row is: LABEL; en_US; pl_PL; es_ES; fr_FR; ...
Following rows are: MYLABEL_FOR_THIS; This is text in english; (insert text in polish); Esto es texto en español; Celui ci est text en français; ...

You modders have to use it too. Every single version, provide that file so we can translate only the current changes, not the whole file every version. Whenever you change, add or remove a text, do it in the csv file. If a text is removed, remove the whole row. If a text is changed, remove all translated texts so we know it needs to be translated again. If a text is added, leave its translations blank.

Make a generator script that converts the CoreText_locales.csv into its corresponding CoreText_en_US.xml, CoreText_pl_PL.xml, CoreText_es_ES, ... and use it to generate them, even for english. If a text in any language is missing, the generator will take the original text in english.

We will download only the locale csv file, translate it fully or partially, and upload it again. The git is only needed when there is more than one translator for every language. Any uploading service will do as it seems that there's going to be just one translator for every language at any moment.

Your development rate is so fast, that we may send translated files from 3-4 versions old. And your aren't going to wait a couple of days before release, so maybe betas shouldn't be translated. Alternatively, we could work on a file that has only recent changes: you remove the texts that needs removing in the base file, and let us a file only with the texts that need translation, easy and fast.

EDIT: I am really concerned that if we don't use a protocol like this or a similar one, we might be translating the same texts over and over.
EDIT2: With this proposed method, we could install CPP for CIV 5 in any language, even if it isn't translated, it just will show texts in english. Currently, if we install CPP in other than english, it just doesn't show correct texts to the extent that it's unplayable.
 
I vote for a single spreadsheet with locales, let me call it locales.csv for now, for every module. I.e. Community Patch/CoreText_locales.csv, CBP/CoreText_locales.csv, ...
First row is: LABEL; en_US; pl_PL; es_ES; fr_FR; ...
Following rows are: MYLABEL_FOR_THIS; This is text in english; (insert text in polish); Esto es texto en español; Celui ci est text en français; ...

You modders have to use it too. Every single version, provide that file so we can translate only the current changes, not the whole file every version. Whenever you change, add or remove a text, do it in the csv file. If a text is removed, remove the whole row. If a text is changed, remove all translated texts so we know it needs to be translated again. If a text is added, leave its translations blank.

Make a generator script that converts the CoreText_locales.csv into its corresponding CoreText_en_US.xml, CoreText_pl_PL.xml, CoreText_es_ES, ... and use it to generate them, even for english. If a text in any language is missing, the generator will take the original text in english.

We will download only the locale csv file, translate it fully or partially, and upload it again. The git is only needed when there is more than one translator for every language. Any uploading service will do as it seems that there's going to be just one translator for every language at any moment.

Your development rate is so fast, that we may send translated files from 3-4 versions old. And your aren't going to wait a couple of days before release, so maybe betas shouldn't be translated. Alternatively, we could work on a file that has only recent changes: you remove the texts that needs removing in the base file, and let us a file only with the texts that need translation, easy and fast.

EDIT: I am really concerned that if we don't use a protocol like this or a similar one, we might be translating the same texts over and over.
EDIT2: With this proposed method, we could install CPP for CIV 5 in any language, even if it isn't translated, it just will show texts in english. Currently, if we install CPP in other than english, it just doesn't show correct texts to the extent that it's unplayable.

I second this. Automating this process is the smartest and most efficient way of accomplishing this.
 
I see polish translation has been removed. Could you, please, explain what happened to avoid same mistakes?

Users noted that the diacritics were not showing up in the translation, and that all text was being pressed against the left-hand side of the screen. Also, some words were running together, creating long nonsense words.

G
 
I just uploaded it to github.

G
I've checked the file in the github with Notepad++. The only difference I can appreciate is that every end of line is marked only [LF], whether in the original file it's [CR][LF].
I don't know if missing the Return tag may spoil something.


By the way, it's a nightmare to track every text file. Also, every mod in CPP has its own schema, making it more complicated yet.
 
I've checked the file in the github with Notepad++. The only difference I can appreciate is that every end of line is marked only [LF], whether in the original file it's [CR][LF].
I don't know if missing the Return tag may spoil something.


By the way, it's a nightmare to track every text file. Also, every mod in CPP has its own schema, making it more complicated yet.

Not sure re: LF/CR issue. I didn't mess with the file endings, just dragged and dropped them into Modbuddy. Wonder if modbuddy did something dumb.

Yeah, the organization is not ideal, but it is better now that it was. At least all text files are in the same spot.

Once development slows to the point that there aren't many text changes (getting there fairly soon), a master file with all text changes would probably help. Something akin to the SQL file described above.

G
 
I am having the same problem. It seems it is a problem regarding sql and utf-8. We cannot use direct unicode text in a sql file. More precisely, unicode characters are replaced by three spaces. I have tried several things:
* Begin with SET NAMES utf8 (don't even read the file)
* Add to the text +char(nchar(unicode character)) (this sentence is ignored)
* Many guides recommend the use of COLLATION, but I have no idea on how to do it.

I also have been looking how official dlcs handle this, and I noticed that all is in XML, save for a few SQL files. Inside SQL files, only sentence related to any translation I have found is this (for the mongol scenario from DLC01):
-- Replace scenario specific entries.
INSERT OR REPLACE INTO LocalizedText (Language, Tag, Gender, Plurality, Text) Select Language, Replace(Tag, 'TXT_KEY_MONGOL_SCENARIO_', 'TXT_KEY_'), Gender, Plurality, Text from LocalizedText where Tag LIKE 'TXT_KEY_MONGOL_SCENARIO_%';

So, the sql there doesn't have any real text.
 
Definitely an issue with encoding. Not sure if UTF-8 is correct for those files.
In my tests the coding should be Central European (Windows 1250)

Take a simple example from the text for the American UA:
This line can't be posted into the Forum because it gets converted into "smilies" -- UTF-8
Wszystkie l¹dowe jednostki wojskowe otrzymuj¹ -- Western (Windows 1252)
Wszystkie l¹dowe jednostki wojskowe otrzymuj¹ -- ISO 8859-1
Wszystkie lıdowe jednostki wojskowe otrzymujı -- ISO 8859-3
Wszystkie lądowe jednostki wojskowe otrzymują -- Central European (Windows 1250)
Wszystkie lšdowe jednostki wojskowe otrzymujš -- ISO 8859-2
Wszystkie l№dowe jednostki wojskowe otrzymuj№ -- Cyrllic (Windows 1251)

Clearly the yellow lines are wrong, even without understanding Polish I can see they don't work. The Central European (Windows 1250) is the only one that translates into English via Google so I'm assuming that is correct.
 
Definitely an issue with encoding. Not sure if UTF-8 is correct for those files.
In my tests the coding should be Central European (Windows 1250)

Take a simple example from the text for the American UA:
This line can't be posted into the Forum because it gets converted into "smilies" -- UTF-8
Wszystkie l¹dowe jednostki wojskowe otrzymuj¹ -- Western (Windows 1252)
Wszystkie l¹dowe jednostki wojskowe otrzymuj¹ -- ISO 8859-1
Wszystkie lıdowe jednostki wojskowe otrzymujı -- ISO 8859-3
Wszystkie lądowe jednostki wojskowe otrzymują -- Central European (Windows 1250)
Wszystkie lšdowe jednostki wojskowe otrzymujš -- ISO 8859-2
Wszystkie l№dowe jednostki wojskowe otrzymuj№ -- Cyrllic (Windows 1251)

Clearly the yellow lines are wrong, even without understanding Polish I can see they don't work. The Central European (Windows 1250) is the only one that translates into English via Google so I'm assuming that is correct.

UTF-8 is fine for spanish, never had a problem with xml encoded in utf-8, at least in official DLCs. It comes when writing texts in sql files. It is usually accepted by default in MySQL, but apparently not so in CIV5 SQL. I still have to check some texts from CBP xml files to make sure this is the problem.

EDIT: I can confirm that translating XML files works as intended.
 
I have translated some mods into Japanese.
As far as I know, we MUST use UTF-8 in XML and SQL files.
Exactly - it's all about using UTF-8. It turned out that ModBuddy tends to save files using ANSI coding, which caused mentioned problems.
 
I don't see that option. Once you convert, diacritics are made unreadable.

A little tutorial, in case of:

In the "encode" menu, they're two part: encode and convert.

1) You have an ansi file (encoded in ansi in checked)

2a)If you click on "encode in utf-8", it change the encoding, your dicritic is messed up because it's always in ansi.

3b)If you cick on "convert in utf-8", it change the encoding in utf-8 and convert all diacritics from your ansi-coding to the utf-8 one, so all diacritics are converted. No further work to do.

note:
Utf-8 bom: it's 2 byte in front of the file to say "hi! i'm an utf-8 file!" but some/most of program don't recognize it, and it can create bugs.
 
Back
Top Bottom