View Full Version : XML text cleaner


j_mie6
Apr 06, 2012, 02:52 PM
The XML text cleaner! (V3)

Download (http://forums.civfanatics.com/downloads.php?do=file&id=18971)

The XML text cleaner is a Python built utility that searches through a mods xml files and makes sure that all Descriptions etc have text tags in the Assets/XML/Text directory

for example say I had made a new civilization like this:


<CivilizationInfo>
<Type>CIVILIZATION_VENICE</Type>
<Description>The Venetian Empire</Description>
<ShortDescription>Venice</ShortDescription>
<Adjective>Venetian</Adjective>
<Civilopedia>Venice is a city in northern Italy bla bla bla...</Civilopedia>


while running the cleaner would convert it to this:


<CivilizationInfo>
<Type>CIVILIZATION_VENICE</Type>
<Description>TXT_KEY_VENICE_DESC</Description>
<ShortDescription>TXT_KEY_VENICE_SHORT_DESC</ShortDescription>
<Adjective>TXT_KEY_VENICE_ADJECTIVE</Adjective>
<Civilopedia>TXT_KEY_VENICE_PEDIA</Civilopedia>


with some text tags of


<TEXT>
<Tag>TXT_KEY_VENICE_DESC</Tag>
<English>The Venetian Empire</English>
<French>The Venetian Empire</French>
<German>The Venetian Empire</German>
<Italian>The Venetian Empire</Italian>
<Spanish>The Venetian Empire</Spanish>
</TEXT>
<TEXT>
<Tag>TXT_KEY_VENICE_SHORT_DESC</Tag>
<English>Venice</English>
<French>Venice</French>
<German>Venice</German>
<Italian>Venice</Italian>
<Spanish>Venice</Spanish>
</TEXT>


etc...

Installation and intructions for use!

usage is very easy, first install python 2.7 from www.python.org (http://www.python.org/download/) (this is required for either version).

open the module with IDLE, then edit the options to fit your mod's xml path/documents (see below) and press F5 to run. (this is is using V1 or if the variable bGraphics is set to False in V2+)

if using V2/V3 simply doubleclick the XML text cleaner.py file and it will open (like image 2 below) then press the options button to enter mod path etc and then press run Cleaner! In GraphicalInterface.py there are some options that allow you to change the colour, font type, font colour and font size of the main box!

there are many options that the user can use.

##Options##

#Enter mod xml location (Files are automatically backed up to the path specified in backupPath
xmlPath = "C:/Program Files/2K Games/Firaxis Games/Sid Meier's Civilization 4 Complete/Beyond the Sword/Mods/[your mod name]/Assets/XML"

#Backed up files will be places in this path in a folder called XMLBackup, MAKE SURE THIS POINTS TO A VALID FILE PATH!
backupPath = "C:/Users/User/Documents"

#Here add files that should ignore the checks (for example EffectInfos has many wrong references, but it is on purpose!)
lIgnores = ["CIV4EffectInfos.xml", ]

#The name of the new text xml (if one doesn't exist this will be the name of the new one!)
textFilename = "CIV4CustomText.xml"

#Language Options (set to True to create language node in xml!)
bFrench = True
bGerman = True
bItalian = True
bSpanish = True

#Search Options (set to True to enable searching for this node!)
bDescription = True
bPedia = True
bStrategy = True
bHelp = True
bShortDescription = True
bAdjective = True
bCityNames = True
bQuotes = True


if I was to set all the Language options to False any text tags created would be in the form of:


<TEXT>
<Tag>TXT_KEY_SOME_TEXT</Tag>
<English>some text</English>
</TEXT>


and turning search options to false ignores that type of node (e.g setting bPedia to False would mean that the program wouldn't bother to check the Civilopedia tags for incompete text.)

When entering paths make sure you use either / or \\ eg

C:/Users/User/Documents or C:\\Users\\User\\Documents

When the program starts it will backup the mods files in the path specified in options. When the program runs again it will delete the previous backup and create new ones. Make sure that when you are finished you check that there are no xml errors before running the program again!

Future features include an xml file for options instead of in the .py file (for non graphical setup) and italian and spanish translation fix, also more intelligent text tag management will be implemented to speed up translation time.

Post any bugs, questions or requests here!

Enjoy! :D

dacubz145
Apr 06, 2012, 04:10 PM
Very nice work, thanks for taking up my request :D

I am going to update my mod in a day or two then im going to run this, since my backup will be uploaded here

j_mie6
Apr 06, 2012, 04:25 PM
good luck!

j_mie6
Apr 06, 2012, 05:22 PM
oh and truthfully, this project was a lot of fun and a good learning curve :p Well spent 12 straight hours of programming :rolleyes: :D

The_J
Apr 06, 2012, 05:55 PM
Congrats :goodjob:.
Sounds like an useful thing for everyone who started with the lazy way :D.

j_mie6
Apr 07, 2012, 10:55 AM
Ok I have finished work on the graphical interface :D

to run with the graphical interface you just need to double click on the XML text cleaner.py file and it is ready to go. Will upload later but going to put on a teaser picture :p

Notes
Will work on autofill of the options box so that user doesn't have to enter the path every single time they use it

dacubz145
Apr 07, 2012, 11:20 AM
Very interested to see the interface

j_mie6
Apr 07, 2012, 11:22 AM
Going to upload it by the end of the day (and update the original post, as you can use one of the options in the .py file to turn off graphics (means you don't have to enter in the paths every time, but I hope to fix that soon with my good old fashioned pickle jar (don't ask :p)))

but the pictures are up now :p

also you can change the bg colour, font colour, font size and font type of that main box in the GraphicalInterface.py file :D

dacubz145
Apr 07, 2012, 11:40 AM
Looks really good, this is definatly coming around real nice!

j_mie6
Apr 07, 2012, 12:07 PM
Version 2 is up!

j_mie6
Apr 09, 2012, 03:37 AM
Ok I have a method of translation and I am workin gon getting it up and running as soon as possible

dacubz145
Apr 09, 2012, 03:18 PM
that is nice! how did you figure it out?

j_mie6
Apr 09, 2012, 03:27 PM
myGengo API :p current having problems with italian and spanish though. The code is telling me they don't work

PsiCorps
Apr 27, 2012, 04:00 PM
This Utility program looks like it will be very useful for me however, you said first install python 2.7 from www.python.org (this is required for either V1 or V2)
Specifically which variation of 2.7? I use Win XP and am not as computer literate as most.

Gzipped source tar ball (2.7.3) (sig)
Bzipped source tar ball (2.7.3) (sig)
XZ source tar ball (2.7.3) (sig)
Windows x86 MSI Installer (2.7.3) (sig)
Windows x86 MSI program database (2.7.3) (sig)
Windows X86-64 MSI Installer (2.7.3) [1] (sig)
Windows X86-64 program database (2.7.3) [1] (sig)

Please help me out.

j_mie6
Apr 28, 2012, 02:15 AM
Really I should have put python 2.7.? because you just take the latest version ie 2.7.3

then it is up to you which file you choose however I would say go for the top one on the page:

•Python 2.7.3 Windows Installer (Windows binary -- does not include source)
option as then it will install easily

hope that helped!

PsiCorps
Apr 28, 2012, 11:27 PM
Really I should have put python 2.7.? because you just take the latest version ie 2.7.3

then it is up to you which file you choose however I would say go for the top one on the page:

Python 2.7.3 Windows Installer (Windows binary -- does not include source)
option as then it will install easily

hope that helped!Perfect, just what I needed.

Thank you.

dacubz145
Apr 28, 2012, 11:55 PM
any update on the translator?

j_mie6
Apr 29, 2012, 04:32 AM
haven't got around to that yet, I still want my e-mail reply from the developer on why italian and spanish are both not working

j_mie6
May 06, 2012, 05:04 PM
right the translator is finished! when the cleaner makes a new text tag it auto translates but there is another button on the GUI that allows you to further translate all your mods existing text tags aswell. The danger is that if you have the typical:


<TEXT>
<Tag>TXT_KEY_ETC</Tag>
<English></English>
<French></French>
<German></German>
<Italian></Italian>
<Spanish></Spanish>
</TEXT>


and only select French translation ALL your text tags will become:


<TEXT>
<Tag>TXT_KEY_ETC</Tag>
<English></English>
<French></French>
</TEXT>


so make sure you select all the languages you want!!! this will be uploaded soon.

if the translation fails the tags will be the english equivilents. For the moment the Italian and Spanish translations are not working (something to do with MyGengo) and are typical english.

How the translator works (for those that are interested):

when the Translate button is clicked/a new tag is made by the program the program then interfaces with the Translator class. In the case of a new tag the translators static get*****() methods are used to collect language *****, it basically sends a request to the MyGengo API and waits for response. With the GUI's Button a new instance of the Translator class is created and passes the path and other options, the __init__ then removes unneeded options. next, doTranslate is called and gets the files from the Text folder, the Text tags are "Replaced" to sort out problems with missing tags (during translation itself) the replaceTags method actually calls the classmethod disectNode() which destroys a TEXT node and throws back its mangled body parts. the replace tags method preceeds to perform life saving operations on the data to reform the tags with the required language tags. now that its all fine doTranslate swaps the language info with translated string and saves the xml (after reformating). finished :p. pretty nice setup I think.

dacubz145
May 06, 2012, 05:32 PM
Can't wait till you upload it:D

j_mie6
May 12, 2012, 11:39 AM
I just did some quick programming before I upload V3. The graphical interface remembers the last options entered into it and automatically assigns them. Running the program and pressing options will create a SavedOptions file in the programs folder. This will store the data for as long as it exists, being overwritten with every use.

The program checks for its existence before trying to use its stored data so deleting this file will not break the program, it will simply restore the option boxes to Default. (useful if you changed the ignored files by removing CIV4EffectInfos.xml and it is causing problems for you in the mod, but you couldn't remember what you needed to enter in! though what are the chances of that really :p)

j_mie6
May 12, 2012, 11:46 AM
Uploaded V3!

Change log

Added translation button to the GUI
Options are now remembered by the GUI

Known bugs
Italian and Spanish translations are not working... not sure why yet, developers of MyGengo haven't given me feedback :rolleyes:

dacubz145
May 12, 2012, 12:32 PM
When I run it it says invalid text reference, to every single thing it translate, looking in the new text file it actually works really well. Translates very nicely, but not sure why it says invalid text reference

j_mie6
May 12, 2012, 12:47 PM
that's what it says when it has found something it needs to fix :p, guess it could be worded better.

the command is:

print "Invalid Text Reference to xml detected:", tag.childNodes[0].data

basically saying this xml 'object' does not use standard naming convention/isn't referenced in the mod's text files.

Nothing to worry about, I would be more worried if it said nothing :lol:

dacubz145
May 12, 2012, 12:54 PM
oh haha thats good

But it is a really slow process, it took about 15 minutes to do 21

Couple things though
1) Accents and things like the two dots above O's and U's in German, doesnt civ not like that? Whenever you put those into civ, it gives text errors (random ? and !), and dont you need to do something special for them in the text things??
2) So i set it to my mod/assets/xml/units but it did not do my unitinfos xml thing, not sure why:confused: and i didnt say not to do it

dacubz145
May 12, 2012, 12:59 PM
^^ at #1 above
for example, in civ the german O with the two dots above, that in the xml text needs to be

&!246;

The ! is really a #, had to switch them cuz otherwise it showed up as a

j_mie6
May 12, 2012, 01:08 PM
pasted from my own mods xml:


<TEXT>
<Tag>TXT_KEY_BUILDING_ROMAN_MONASTERY</Tag>
<English>Roman Monastery</English>
<French>Roman monastre</French>
<German>rmischen Kloster</German>
<Italian>Monastero romano</Italian>
<Spanish>Monasterio romano</Spanish>
</TEXT>


in game it is identical... I never put in the special characters, even when doing it by hand :p Probably shouldn't but meh :lol:

Just so you know, you set it to Assets/XML not nessicarily Assets/XML/Units (though it shouldn't affect it...) does it not say in the log that it checking the file? try running it again :p If that doesn't work I will have to make you a debug version that will tell me exactly what it is doing in your files :p

21 files? well concidering you are using all your references wrong that isn't surprising timewise! it would be faster if it wasn't doing active translation becuase that can take time. ummmm in the main programs file replace:



def createXMLFile(filename, tag, enValue, bLanguages): #if no text file is present, creates a xml text file!
doc = mini.Document()
doc.appendChild(doc.createComment("created with XML text fixer.py, J_mie6"))
nodeMain = doc.createElement("Civ4GameText")
nodeMain.setAttribute("xmlns", "http://www.firaxis.com")
doc.appendChild(nodeMain)

nodeText = doc.createElement("TEXT")
nodeMain.appendChild(nodeText)

nodeTag = doc.createElement("Tag")
nodeTag.appendChild(doc.createTextNode(tag))
nodeText.appendChild(nodeTag)

nodeEnglish = doc.createElement("English")
nodeEnglish.appendChild(doc.createTextNode(enValue ))
nodeText.appendChild(nodeEnglish)

if bLanguages[0]:
nodeFrench = doc.createElement("French")
nodeFrench.appendChild(doc.createTextNode(Translat or.getFrench(enValue)))
nodeText.appendChild(nodeFrench)

if bLanguages[1]:
nodeGerman = doc.createElement("German")
nodeGerman.appendChild(doc.createTextNode(Translat or.getGerman(enValue)))
nodeText.appendChild(nodeGerman)

if bLanguages[2]:
nodeItalian = doc.createElement("Italian")
nodeItalian.appendChild(doc.createTextNode(Transla tor.getItalian(enValue)))
nodeText.appendChild(nodeItalian)

if bLanguages[3]:
nodeSpanish = doc.createElement("Spanish")
nodeSpanish.appendChild(doc.createTextNode(Transla tor.getSpanish(enValue)))
nodeText.appendChild(nodeSpanish)

newXML = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL).sub('>\g<1></', doc.toprettyxml('\t', '\n', 'ISO-8859-1')) #formats the code :p
text = open(filename, "w") #create the xml document
text.write(newXML) #puts xml inside
text.close() #closes the xml document

def createTag(filename, tag, enValue, doc, bLanguages): #creates a new tag and appends it to the xml text file
nodeMain = doc.childNodes[1]
nodeText = doc.createElement("TEXT")
nodeMain.appendChild(nodeText)

nodeTag = doc.createElement("Tag")
nodeTag.appendChild(doc.createTextNode(tag))
nodeText.appendChild(nodeTag)

nodeEnglish = doc.createElement("English")
nodeEnglish.appendChild(doc.createTextNode(enValue ))
nodeText.appendChild(nodeEnglish)

if bLanguages[0]:
nodeFrench = doc.createElement("French")
nodeFrench.appendChild(doc.createTextNode(Translat or.getFrench(enValue)))
nodeText.appendChild(nodeFrench)

if bLanguages[1]:
nodeGerman = doc.createElement("German")
nodeGerman.appendChild(doc.createTextNode(Translat or.getGerman(enValue)))
nodeText.appendChild(nodeGerman)

if bLanguages[2]:
nodeItalian = doc.createElement("Italian")
nodeItalian.appendChild(doc.createTextNode(Transla tor.getItalian(enValue)))
nodeText.appendChild(nodeItalian)

if bLanguages[3]:
nodeSpanish = doc.createElement("Spanish")
nodeSpanish.appendChild(doc.createTextNode(Transla tor.getSpanish(enValue)))
nodeText.appendChild(nodeSpanish)

newXML = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL).sub('>\g<1></', doc.toprettyxml('\t', '\n', 'ISO-8859-1')) #formats the code :p
string = '' #used for whitespace removal
for line in newXML.split('\n'): #removes weird whitespace problem
if line.strip():
string += line + '\n'
text = open(filename, "w") #open the xml document
text.write(string) #puts xml inside
text.close() #closes the xml document




with



def createXMLFile(filename, tag, enValue, bLanguages): #if no text file is present, creates a xml text file!
doc = mini.Document()
doc.appendChild(doc.createComment("created with XML text fixer.py, J_mie6"))
nodeMain = doc.createElement("Civ4GameText")
nodeMain.setAttribute("xmlns", "http://www.firaxis.com")
doc.appendChild(nodeMain)

nodeText = doc.createElement("TEXT")
nodeMain.appendChild(nodeText)

nodeTag = doc.createElement("Tag")
nodeTag.appendChild(doc.createTextNode(tag))
nodeText.appendChild(nodeTag)

nodeEnglish = doc.createElement("English")
nodeEnglish.appendChild(doc.createTextNode(enValue ))
nodeText.appendChild(nodeEnglish)

if bLanguages[0]:
nodeFrench = doc.createElement("French")
nodeFrench.appendChild(doc.createTextNode(enValue) )
nodeText.appendChild(nodeFrench)

if bLanguages[1]:
nodeGerman = doc.createElement("German")
nodeGerman.appendChild(doc.createTextNode(enValue) )
nodeText.appendChild(nodeGerman)

if bLanguages[2]:
nodeItalian = doc.createElement("Italian")
nodeItalian.appendChild(doc.createTextNode(enValue ))
nodeText.appendChild(nodeItalian)

if bLanguages[3]:
nodeSpanish = doc.createElement("Spanish")
nodeSpanish.appendChild(doc.createTextNode(enValue ))
nodeText.appendChild(nodeSpanish)

newXML = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL).sub('>\g<1></', doc.toprettyxml('\t', '\n', 'ISO-8859-1')) #formats the code :p
text = open(filename, "w") #create the xml document
text.write(newXML) #puts xml inside
text.close() #closes the xml document

def createTag(filename, tag, enValue, doc, bLanguages): #creates a new tag and appends it to the xml text file
nodeMain = doc.childNodes[1]
nodeText = doc.createElement("TEXT")
nodeMain.appendChild(nodeText)

nodeTag = doc.createElement("Tag")
nodeTag.appendChild(doc.createTextNode(tag))
nodeText.appendChild(nodeTag)

nodeEnglish = doc.createElement("English")
nodeEnglish.appendChild(doc.createTextNode(enValue ))
nodeText.appendChild(nodeEnglish)

if bLanguages[0]:
nodeFrench = doc.createElement("French")
nodeFrench.appendChild(doc.createTextNode(enValue) )
nodeText.appendChild(nodeFrench)

if bLanguages[1]:
nodeGerman = doc.createElement("German")
nodeGerman.appendChild(doc.createTextNode(enValue) )
nodeText.appendChild(nodeGerman)

if bLanguages[2]:
nodeItalian = doc.createElement("Italian")
nodeItalian.appendChild(doc.createTextNode(enValue ))
nodeText.appendChild(nodeItalian)

if bLanguages[3]:
nodeSpanish = doc.createElement("Spanish")
nodeSpanish.appendChild(doc.createTextNode(enValue ))
nodeText.appendChild(nodeSpanish)

newXML = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL).sub('>\g<1></', doc.toprettyxml('\t', '\n', 'ISO-8859-1')) #formats the code :p
string = '' #used for whitespace removal
for line in newXML.split('\n'): #removes weird whitespace problem
if line.strip():
string += line + '\n'
text = open(filename, "w") #open the xml document
text.write(string) #puts xml inside
text.close() #closes the xml document




this disables active translation so it will run faster, bear in mind that after the first run the program runs faster anyway as it isn't finding as many errors to replace! I will work on an option that will enable/disable active translation.

dacubz145
May 12, 2012, 01:22 PM
Not 21 files, 21 items, (so like 8 promotions, 3 builds etc)

also if i set it to just assets/xml it only detects like 7 files and none of them needed translations

and even when i set it to assets/xml/units it only detects 8 files, but it only does 4! im not sure why, i even set it so it does not check those 4, and it still checks those 4 and not the other four

also problems with the text is random. I just added a Baha'i religion, which the second A and the I has an accent. In the religion it looks great. When i make it a Baha'i missionary, the text shows up all funny it looks like bahA!i? and i have no idea why :lol:

When translating you definatly need to convert it, otherwise it really is not going to work. I mean i can do it myself, figuring out what is what and just find and replace, but other people that dont know how to do that, its going to mess up their translations

j_mie6
May 12, 2012, 01:26 PM
well, if you can give me a list of those special characters I can build in converters to the translator :p

definatly gonna have to make a debug for you... btw it detects any type of file present but only interacts with xml files... however it could be there are permission problems... I am gonna work on a debug version right now :D

dacubz145
May 12, 2012, 01:29 PM
if you want, i can just post all the xml in my mod, and you can just run it and post it back since it just xml doesnt take up much space, whichever is easier

j_mie6
May 12, 2012, 01:31 PM
well I could but if the problem is your end then I still won't know :p

dacubz145
May 12, 2012, 01:39 PM
Anyway you could just run my xml quickly now? Because im going out soon and I really want to update my mod soon, so I really could use it translated quickly. But after that I am gdefinatly going to keep using the xml editor so i can definatly keep posting feedback

j_mie6
May 12, 2012, 02:21 PM
... I am working on that.. :p for some reason it is saying your building defines is not well formed... :confused:

also you might like to merge ALL of those text files together!!!! (plus the cleaner cannot detect folders within the Text folder...) over 250! my computer is struggling to even open the folder (gonna merge them myself.) having some sort of problem which I wasn't having with my own files and I will attempt to get your files working by the end of the night (or tommorrow night)... it is gonna be a loooooooooong night

j_mie6
May 12, 2012, 02:39 PM
right this I don't like opening your text files is because of this:

ExpatError: not well-formed (invalid token): line 40, column 16

and that happens to be:

<Text>Ninevh</Text>

yup, those special characters need to be used for atleast some of the accented letters. and others simmilar work but not these ones...

do you have that list? I can't continue without those special characters...

I remember having this problem before with the xml... when I was writing with french verb dictionary I found the xml parser refused to accept some accents... in response to this I had to make special characters in the xml for the program to decode ie was e(>) and was e(`)

this means the xml MUST follow the special character conventions... and that will take time as the program can't sort this out, it can't parse the xml files it can only enter the characters correctly... this means the user has to make sure the XML all has special characters... I am going to have to experiment around with C++ to see if it can just go it and do some replacing (or if there is another way...)



in any case I don't yet understand why the cleaner isn't working properly for you. it might be because of this or because of permissions... I am going to work very hard to get this fixed and working asap!!!

dacubz145
May 12, 2012, 02:42 PM
there is no list, i just looked in the civ 4 original text files for it, im sure someone knows if u make a post about it in the quick modding questions thres

j_mie6
May 12, 2012, 03:08 PM
I got them :p

turns out they are the html codes for special characters... now I am going to have to write a program that will fix the text files :p be it python (which I think is possible I just checked) or faster C++ (which may not be possible, I will ask my C++ programming friend tommorrow)

once the text files are fixed it should be nice and easy for the cleaner to do it's cleaning :D. thank god you helped me find this bug!

j_mie6
May 12, 2012, 03:49 PM
this simple program will convert all the characters :p very fast too :p


# -*- coding: cp1252 -*-
fileread = open("Assyria_CIV4GameText.xml", "r")
string = fileread.read()
fileread.close()

accentMap = {"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "",\
"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "",\
"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "",\
"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "",\
"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "",\
"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "",\
"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : ""}

nstring = ""
for char in string:
if char in accentMap.keys():
char = accentMap[char]
nstring = nstring + char

filewrite = open("Assyria_CIV4GameText.xml", "w")
filewrite.write(nstring)
filewrite.close()


now to implement this into my cleaner somehow...

edit: of course the second character in each of those pairs is the html notation but you can't see it :p

btw might I suggest that the name of the buildings art file is changed to not have an in it: Palcio do Planalto

dacubz145
May 12, 2012, 03:52 PM
very nice work:D

and :lol: i was confused why there were no codes, but yes of course it shows the actual letters :lol:

j_mie6
May 12, 2012, 04:06 PM
see my edit lol, I am going to have to get the program to pass over every file in the XML folder anyway but this means the program will change the Palcio do Planalto's define to include the html code instead messing up your whole art system. either it must be ignored in the options or name it Palacio do Planalto

dacubz145
May 12, 2012, 04:27 PM
EDIT: nevermind i missunderstood you

can't you do a check, and only change the accent ones if they are between the <language></language>, and not between the <tag></tag>

j_mie6
May 12, 2012, 04:57 PM
well yes but then the program would complain. that accent is still in the xml and the program can't load the file. the only way around it is to ignore the file or to remove that accent.

still working on fixing all this stuff btw :p

dacubz145
May 12, 2012, 05:01 PM
I would just remove it then, dont see why it matters. If the pedia tag doesnt have the accent, it wont matter as long as the actual text still does, id say remove it

j_mie6
May 12, 2012, 05:09 PM
no thats not the point lol. the your artdefines building the Palacio's actual path name contains an accent. the program will correct this and your art is now in the wrong place.

the stuff regarding text is all fine, no harm anywhere, but paths need to be accent free or set to ignore.

and after noting that if the program can load a file it doesn't fix it I realised some files with acceptable accents will be left. so I make it do all of them and as it checks each character individually it takes a loooooooooong time. guess I will leave acceptable accents in :p this setup also means that the program will only run slow the first time it's used, as after accents that confict should not appear in the code :D

The_J
May 12, 2012, 05:32 PM
this simple program will convert all the characters :p very fast too :p


# -*- coding: cp1252 -*-
fileread = open("Assyria_CIV4GameText.xml", "r")
string = fileread.read()
fileread.close()

accentMap = {"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "",\
"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "",\
"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "",\
"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "",\
"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "",\
"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "",\
"" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : ""}

nstring = ""
for char in string:
if char in accentMap.keys():
char = accentMap[char]
nstring = nstring + char

filewrite = open("Assyria_CIV4GameText.xml", "w")
filewrite.write(nstring)
filewrite.close()


now to implement this into my cleaner somehow...

edit: of course the second character in each of those pairs is the html notation but you can't see it :p

btw might I suggest that the name of the buildings art file is changed to not have an in it: Palcio do Planalto

It's easier to convert these characters into their HTML equivalents ^^.

def htmlconvert(string):
newString =""
for char in string:
if ord(char)>127:
newString = newString+'&#'+str(ord(char))+';'
else:
newString = newString+char
return newString

j_mie6
May 12, 2012, 05:45 PM
whats the advantages of this method over mine? I guessing it doesn't need a coding declaration?

dacubz145
May 12, 2012, 05:46 PM
Ohh i didnt know it was in my art defines, yeah that can be removed no problem, just need to change it in the buildings art defines as well

EDIT: also by the looks of it, Js way looks shorter, and you definatly want to make it as fast as possible, but idk python so ican be wrong

j_mie6
May 12, 2012, 05:49 PM
I am still looking to convert it to C++ though for even faster speed. as it does take a long time when going through 355 massive files :p, plus the J's code is shorter as it only deals with the string and not the files themselves, but that accounts for only 6 or 7 lines more...

The_J
May 12, 2012, 06:04 PM
Right, right. In my full code, the reading + writing of the file is handled elsewhere.

whats the advantages of this method over mine? I guessing it doesn't need a coding declaration?

Think it still needs one.
The advantage is that you don't need a dictionary, and will catch even stuff which you never thought about. No need to care about what characters are in there, you can't even forget one, the code will deal with it, no matter what.

j_mie6
May 12, 2012, 06:14 PM
hmmm ok, though apart from I have all off the html special chars in the dictionary I think

j_mie6
May 13, 2012, 05:50 AM
ok, in the interest of speed, if the program says a file is incapatable you then click the button on the GUI which fixes all files to use the html codes. however the way this will hopefully be faster is the code run is something different :p


os.system("C:\\Users\\User\\Documents\\Programming\\C++\\fixe r.exe")


this runs a C++ application that will fix the files way way faster. just gotta wait till my friend gets back from climbing to ask him how I would do the script in C++

of course the Translator will automatically run The_J's code when it is translating mods text :p

j_mie6
May 13, 2012, 08:22 AM
ok so I have C++ code that replaces the accents in a given string with the html code... (can't work out to use a method simmilar to The_J's yet to cast the character without the map)

just need to work on opening the files etc.



#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <sstream>
using namespace std;

/*
Made by J_mie6 with help from The_J
This program is designed to be run via the XML cleaner to fix incompatible files;
Originally made in python this program ran very slowly and hopefully this C++ version
will drastically increase the "fixing" of these files containing characters that the XML
parser cannot handle (thus stopping the program from working properly!)
*/

map<const string, int> values;

void setValues()
{
values[""] = 192; values[""] = 193; values[""] = 194; values[""] = 195; values[""] = 196; values[""] = 197; values[""] = 198; values[""] = 199; values[""] = 200;
values[""] = 201; values[""] = 202; values[""] = 203; values[""] = 204; values[""] = 205; values[""] = 206; values[""] = 207; values[""] = 208; values[""] = 209;
values[""] = 210; values[""] = 211; values[""] = 212; values[""] = 213; values[""] = 214; values[""] = 215; values[""] = 216; values[""] = 217; values[""] = 218;
values[""] = 219; values[""] = 220; values[""] = 221; values[""] = 222; values[""] = 223; values[""] = 224; values[""] = 225; values[""] = 226; values[""] = 227;
values[""] = 228; values[""] = 229; values[""] = 230; values[""] = 231; values[""] = 232; values[""] = 233; values[""] = 234; values[""] = 235; values[""] = 236;
values[""] = 237; values[""] = 238; values[""] = 239; values[""] = 240; values[""] = 241; values[""] = 242; values[""] = 243; values[""] = 244; values[""] = 245;
values[""] = 246; values[""] = 248; values[""] = 249; values[""] = 250; values[""] = 251; values[""] = 252; values[""] = 253; values[""] = 254; values[""] = 255;
}

string convertStringToInt(string value)
{
stringstream ss;
ss << values[value];
return ss.str();
}

string getHtmlValue(string character)
{
return "&#" + convertStringToInt(character) + ";";
}

int main ()
{

setValues();

string base = "hllo";
string html = getHtmlValue("");

string str = base;
str.replace(1, 1, html);

cout<< str<< endl;
cin.get();
return 0;
}

j_mie6
May 13, 2012, 12:40 PM
update:

the C++ program is finsihed aside from getting the input from Python (which is just a big string of filenames that will be checked. C++ can't find them itself and I am reluctant to write the names into a file made by python to then be read by C++, I'd rather python 'piped' them in. however can't work out how C++ receives the pipe itself yet.



#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <vector>
#include <sstream>
#include <stdio.h>
#include <stdlib.h>
using namespace std;

/*
Made by J_mie6 with help from The_J
This program is designed to be run via the XML cleaner to fix incompatible files;
Originally made in python this program ran very slowly and hopefully this C++ version
will drastically increase the "fixing" of these files containing characters that the XML
parser cannot handle (thus stopping the program from working properly!)
*/

map<const char, int> values;

void setValues()
{
values[''] = 192; values[''] = 193; values[''] = 194; values[''] = 195; values[''] = 196; values[''] = 197; values[''] = 198; values[''] = 199; values[''] = 200;
values[''] = 201; values[''] = 202; values[''] = 203; values[''] = 204; values[''] = 205; values[''] = 206; values[''] = 207; values[''] = 208; values[''] = 209;
values[''] = 210; values[''] = 211; values[''] = 212; values[''] = 213; values[''] = 214; values[''] = 215; values[''] = 216; values[''] = 217; values[''] = 218;
values[''] = 219; values[''] = 220; values[''] = 221; values[''] = 222; values[''] = 223; values[''] = 224; values[''] = 225; values[''] = 226; values[''] = 227;
values[''] = 228; values[''] = 229; values[''] = 230; values[''] = 231; values[''] = 232; values[''] = 233; values[''] = 234; values[''] = 235; values[''] = 236;
values[''] = 237; values[''] = 238; values[''] = 239; values[''] = 240; values[''] = 241; values[''] = 242; values[''] = 243; values[''] = 244; values[''] = 245;
values[''] = 246; values[''] = 248; values[''] = 249; values[''] = 250; values[''] = 251; values[''] = 252; values[''] = 253; values[''] = 254; values[''] = 255;
values[''] = 161; values[''] = 191; values[''] = 247; values['Œ'] = 338; values['œ'] = 339; values['Š'] = 352; values['š'] = 353; values['Ÿ'] = 376; values['ƒ'] = 402;
}

string convertCharToInt(char value)
{
stringstream ss;
ss << values[value];
return ss.str();
}

string getHtmlValue(char character)
{
return "&#" + convertCharToInt(character) + ";";
}

bool isCharSpecial(char c)
{
return values.count(c)>0;
}

string getTextFromFile(const char* filename)
{
vector<string> text;
string line;
ifstream textstream (filename);
while (getline(textstream, line)) {
text.push_back(line + "\n");
}
textstream.close();
string alltext;
for (int i=0; i < text.size(); i++){
alltext += text[i];
}
return alltext;
}

void writeTextToFile(const char* filename, string data)
{
ofstream file;
file.open (filename);
file << data;
file.close();

}

vector <string> splitInput(string data, char separator)
{
istringstream ss( data );
vector <string> vData;
while (!ss.eof())
{
string x;
getline( ss, x, separator );
vData.push_back(x);
}

return vData;
}

int main (int argc, char *argv[])
{
string files;
files = argv[1];

vector <string> vFiles = splitInput(files, ',');
setValues();
const char* filename;

for (int x=0; x<vFiles.size()-1; x++)
{
filename = vFiles[x].c_str();
cout << "Correcting: "<< filename << endl;
string str = getTextFromFile(filename);
string html;

for(int i=0; i<str.length(); i++)
{
if (isCharSpecial(str[i]))
{
html = getHtmlValue(str[i]);
str.replace(i, 1, html);
}
}

writeTextToFile(filename, str);

}
cin.get();
return 0;
}



then after this is finished I hook it up to the GUI as well as The_J's script and try and get your files working...

edit: well we got the arguements to send over, however windows fails to send all of your file names over as it exceeds size restriction!!! (and that is being send as a string :p) this is why you gotta merge them :p still after testing the C++ version works much faster and that is with C++'s slow printing telling you what file it is checking. edit: in fact it runs all the files (bar the 280+ text files) in less than a minute :D

would you mind merging all your text files into like 10 or so and sending them back to me so I can continue?

dacubz145
Jun 16, 2012, 05:25 PM
quick thought

does this check for things in [] since when you have [tab] or [dot] you do not want that translated

j_mie6
Jun 17, 2012, 04:15 AM
It doesn't actualy, that makes sense, I will implement this when I have got the next version!