TPEHEP
Prince
OK look at attached 2 it said some will be lost because of the UTF coding??
All was lost

OK look at attached 2 it said some will be lost because of the UTF coding??
You specified a false coding by using the header with the Latin1 encoding. UTF-8 needs to be used here.Looks like if is supposed to except at the top, but that is a simple, copy and paste.
Other thing is, when doing something like this put it in CODE wrap(see attached, just click on it) and thx for trying, everything helps no matter how big or small.Code:
I tell you what, i will make a Text file just for YOU for Russian ok: (See attached 2)
OK look at attached 2 it said some will be lost because of the UTF coding??
You specified a false coding by using the header with the Latin1 encoding. UTF-8 needs to be used here.
But from some tests I have run today it seems like the XML parser used by Civ ignores the encoding tag and does not properly read it into the unicode strings used internally.
I am investigating if I can get it to yield the unchanged raw UTF-8 encoded text and then convert it myself to the internal wide character unicode.
Edit: Success, reading in the raw string as narrow string and then calling the Windows API to convert the string seems to work. But I will still need to write that code properly and investigate in how far that has an effect on different existing texts (I guess they will all need to be converted to UTF-8 for the cases where ä,ö or similar are used directly).
Latin1 aka ISO-8859-1 is an 8 bit character set that contains most of the characters that are used by western european languages and all of the translations that we currently have only use those characters. Most texts even use only the 7 bit subset of it that is equivalent to ASCII and use HTML encoding for the other characters. Setting those to ISO-8859-1 is fine (and it seems like the XML parser in Civ ignores that encoding setting anyway and always reads with that encoding).I thought something was wrong. But the coding changed it to work that way, i bet ALOT of the TXT then is in-correct.
there is 17456 lines of text, then.
my modding team may try to help to provide polish translation, if we get access to SVN.
we have been recently translating a mod with about 60 000 lines of text with mere knowledge of language (mod was russian, most of us didn't know russian), so this would be much easier. still it took us about 2 years to do so, so I won't expect we would be done in 2-3 months, but rather longer.
This does not work for solving the utf-8 problem.we just need to provide any font instead of sylfaen and use it
But when you pass it to one of the widgets that are in the exe now, then it seems like any wide character that has 0 as first byte is retrieved from the TTF while e.g. the cyrillic unicode only appears as blank.
Actually all latin-1 characters, not only the ASCII ones.so all ascii charatcters would come from sylfaen font and all non-ascii characters from tga symbol font?
Not sure what you mean with code here.how many characters would be needed to get included, and wouldn't that be much of code? I propose to do one code for all symbols by using unicode glyph number.
It seems like my previous assumption was not entirely correct. Using ä and the like directly and then storing it as UTF-8 works but the usage of ä or the equivalent HTML code without storing it as UTF-8 doesn't. Neither does the HTML code as UTF-8.now this latest change did mess up quite a lot of the german entries. The perfect challenge to make all the files finally in the right way. As soon as I have finished adapting and programming my little editor this will proceed much faster than til now.
The Pedia currently uses the standard Python sort/compare which sorts by byte value and not by the usual dictionary ways.Still it shows another issue with the Civpedia: it should sort the entries in another way. The actual sorting of the civpedia shows its weakness, it takes just the first letter of the entry and makes the index out of these letters.
Better (especially for german) would be:
- ignoring case sensitivity, so sorting "g" together with "G" and so on (asides from the fact that there should not occur small characters)
- ignoring the Ää / Öö / Üü, treating these letters just like Aa / Oo / Uu, as it is common in dictionaries too (the ß / & szlig; would count as "ss").
At the moment, the additional index-page-entries cluster the selection, as to see on the screenshot.
While that would be possible, I would prefer a backward compatible way so we don't have to change all of it and especially text we add from other mod comps will still work.I am still working on sort of converting tool (integrated in my translation gui) , just need a bit more time til it works, then we will get rid of those oddities anyways. So dont worry about backwards-compatibility for german - its anyways better to have a clean source.
If I am too slow, always it could work with search/replace to the "right" letter-version to use.
There are special comparison functions that I think rely on the OS for that kind of thing.I know the civpedia uses this way, but is there a possibility to change it in python ? the small-letters there should not be a problem as there should not occur small letters in the beginning, its just about the Ää/Öö/Üü treating for this like AaOoUu, and maybe the ß as ss too.
Think for the other languages, the È and É and similar are sorted the same way, so this would be a bit more complex *scratchhead*