XML text cleaner

I just did some quick programming before I upload V3. The graphical interface remembers the last options entered into it and automatically assigns them. Running the program and pressing options will create a SavedOptions file in the programs folder. This will store the data for as long as it exists, being overwritten with every use.

The program checks for its existence before trying to use its stored data so deleting this file will not break the program, it will simply restore the option boxes to Default. (useful if you changed the ignored files by removing CIV4EffectInfos.xml and it is causing problems for you in the mod, but you couldn't remember what you needed to enter in! though what are the chances of that really :p)
 
Uploaded V3!

Change log

Added translation button to the GUI
Options are now remembered by the GUI

Known bugs
Italian and Spanish translations are not working... not sure why yet, developers of MyGengo haven't given me feedback :rolleyes:
 
that's what it says when it has found something it needs to fix :p, guess it could be worded better.

the command is:

print "Invalid Text Reference to xml detected:", tag.childNodes[0].data

basically saying this xml 'object' does not use standard naming convention/isn't referenced in the mod's text files.

Nothing to worry about, I would be more worried if it said nothing :lol:
 
oh haha thats good

But it is a really slow process, it took about 15 minutes to do 21

Couple things though
1) Accents and things like the two dots above O's and U's in German, doesnt civ not like that? Whenever you put those into civ, it gives text errors (random ? and !), and dont you need to do something special for them in the text things??
2) So i set it to my mod/assets/xml/units but it did not do my unitinfos xml thing, not sure why:confused: and i didnt say not to do it
 
pasted from my own mods xml:

Code:
	<TEXT>
		<Tag>TXT_KEY_BUILDING_ROMAN_MONASTERY</Tag>
		<English>Roman Monastery</English>
		<French>Roman monastère</French>
		<German>römischen Kloster</German>
		<Italian>Monastero romano</Italian>
		<Spanish>Monasterio romano</Spanish>
	</TEXT>

in game it is identical... I never put in the special characters, even when doing it by hand :p Probably shouldn't but meh :lol:

Just so you know, you set it to Assets/XML not nessicarily Assets/XML/Units (though it shouldn't affect it...) does it not say in the log that it checking the file? try running it again :p If that doesn't work I will have to make you a debug version that will tell me exactly what it is doing in your files :p

21 files? well concidering you are using all your references wrong that isn't surprising timewise! it would be faster if it wasn't doing active translation becuase that can take time. ummmm in the main programs file replace:

Spoiler :

Code:
def createXMLFile(filename, tag, enValue, bLanguages): #if no text file is present, creates a xml text file!
    doc = mini.Document()
    doc.appendChild(doc.createComment("created with XML text fixer.py, J_mie6"))
    nodeMain = doc.createElement("Civ4GameText")
    nodeMain.setAttribute("xmlns", "http://www.firaxis.com")
    doc.appendChild(nodeMain)
    
    nodeText = doc.createElement("TEXT")
    nodeMain.appendChild(nodeText)
    
    nodeTag = doc.createElement("Tag")
    nodeTag.appendChild(doc.createTextNode(tag))
    nodeText.appendChild(nodeTag)
    
    nodeEnglish = doc.createElement("English")
    nodeEnglish.appendChild(doc.createTextNode(enValue))
    nodeText.appendChild(nodeEnglish)
    
    if bLanguages[0]:
        nodeFrench = doc.createElement("French")
        nodeFrench.appendChild(doc.createTextNode(Translator.getFrench(enValue)))
        nodeText.appendChild(nodeFrench)

    if bLanguages[1]:
        nodeGerman = doc.createElement("German")
        nodeGerman.appendChild(doc.createTextNode(Translator.getGerman(enValue)))
        nodeText.appendChild(nodeGerman)

    if bLanguages[2]:
        nodeItalian = doc.createElement("Italian")
        nodeItalian.appendChild(doc.createTextNode(Translator.getItalian(enValue)))
        nodeText.appendChild(nodeItalian)
        
    if bLanguages[3]:
        nodeSpanish = doc.createElement("Spanish")
        nodeSpanish.appendChild(doc.createTextNode(Translator.getSpanish(enValue)))
        nodeText.appendChild(nodeSpanish)

    newXML = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL).sub('>\g<1></', doc.toprettyxml('\t', '\n', 'ISO-8859-1')) #formats the code :p
    text = open(filename, "w") #create the xml document
    text.write(newXML) #puts xml inside
    text.close() #closes the xml document

def createTag(filename, tag, enValue, doc, bLanguages): #creates a new tag and appends it to the xml text file
    nodeMain = doc.childNodes[1]
    nodeText = doc.createElement("TEXT")
    nodeMain.appendChild(nodeText)
    
    nodeTag = doc.createElement("Tag")
    nodeTag.appendChild(doc.createTextNode(tag))
    nodeText.appendChild(nodeTag)
    
    nodeEnglish = doc.createElement("English")
    nodeEnglish.appendChild(doc.createTextNode(enValue))
    nodeText.appendChild(nodeEnglish)
    
    if bLanguages[0]:
        nodeFrench = doc.createElement("French")
        nodeFrench.appendChild(doc.createTextNode(Translator.getFrench(enValue)))
        nodeText.appendChild(nodeFrench)

    if bLanguages[1]:
        nodeGerman = doc.createElement("German")
        nodeGerman.appendChild(doc.createTextNode(Translator.getGerman(enValue)))
        nodeText.appendChild(nodeGerman)

    if bLanguages[2]:
        nodeItalian = doc.createElement("Italian")
        nodeItalian.appendChild(doc.createTextNode(Translator.getItalian(enValue)))
        nodeText.appendChild(nodeItalian)
        
    if bLanguages[3]:
        nodeSpanish = doc.createElement("Spanish")
        nodeSpanish.appendChild(doc.createTextNode(Translator.getSpanish(enValue)))
        nodeText.appendChild(nodeSpanish)

    newXML = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL).sub('>\g<1></', doc.toprettyxml('\t', '\n', 'ISO-8859-1')) #formats the code :p
    string = '' #used for whitespace removal
    for line in newXML.split('\n'): #removes weird whitespace problem
        if line.strip():
            string += line + '\n'
    text = open(filename, "w") #open the xml document
    text.write(string) #puts xml inside
    text.close() #closes the xml document


with

Spoiler :

Code:
def createXMLFile(filename, tag, enValue, bLanguages): #if no text file is present, creates a xml text file!
    doc = mini.Document()
    doc.appendChild(doc.createComment("created with XML text fixer.py, J_mie6"))
    nodeMain = doc.createElement("Civ4GameText")
    nodeMain.setAttribute("xmlns", "http://www.firaxis.com")
    doc.appendChild(nodeMain)
    
    nodeText = doc.createElement("TEXT")
    nodeMain.appendChild(nodeText)
    
    nodeTag = doc.createElement("Tag")
    nodeTag.appendChild(doc.createTextNode(tag))
    nodeText.appendChild(nodeTag)
    
    nodeEnglish = doc.createElement("English")
    nodeEnglish.appendChild(doc.createTextNode(enValue))
    nodeText.appendChild(nodeEnglish)
    
    if bLanguages[0]:
        nodeFrench = doc.createElement("French")
        nodeFrench.appendChild(doc.createTextNode(enValue))
        nodeText.appendChild(nodeFrench)

    if bLanguages[1]:
        nodeGerman = doc.createElement("German")
        nodeGerman.appendChild(doc.createTextNode(enValue))
        nodeText.appendChild(nodeGerman)

    if bLanguages[2]:
        nodeItalian = doc.createElement("Italian")
        nodeItalian.appendChild(doc.createTextNode(enValue))
        nodeText.appendChild(nodeItalian)
        
    if bLanguages[3]:
        nodeSpanish = doc.createElement("Spanish")
        nodeSpanish.appendChild(doc.createTextNode(enValue))
        nodeText.appendChild(nodeSpanish)

    newXML = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL).sub('>\g<1></', doc.toprettyxml('\t', '\n', 'ISO-8859-1')) #formats the code :p
    text = open(filename, "w") #create the xml document
    text.write(newXML) #puts xml inside
    text.close() #closes the xml document

def createTag(filename, tag, enValue, doc, bLanguages): #creates a new tag and appends it to the xml text file
    nodeMain = doc.childNodes[1]
    nodeText = doc.createElement("TEXT")
    nodeMain.appendChild(nodeText)
    
    nodeTag = doc.createElement("Tag")
    nodeTag.appendChild(doc.createTextNode(tag))
    nodeText.appendChild(nodeTag)
    
    nodeEnglish = doc.createElement("English")
    nodeEnglish.appendChild(doc.createTextNode(enValue))
    nodeText.appendChild(nodeEnglish)
    
    if bLanguages[0]:
        nodeFrench = doc.createElement("French")
        nodeFrench.appendChild(doc.createTextNode(enValue))
        nodeText.appendChild(nodeFrench)

    if bLanguages[1]:
        nodeGerman = doc.createElement("German")
        nodeGerman.appendChild(doc.createTextNode(enValue))
        nodeText.appendChild(nodeGerman)

    if bLanguages[2]:
        nodeItalian = doc.createElement("Italian")
        nodeItalian.appendChild(doc.createTextNode(enValue))
        nodeText.appendChild(nodeItalian)
        
    if bLanguages[3]:
        nodeSpanish = doc.createElement("Spanish")
        nodeSpanish.appendChild(doc.createTextNode(enValue))
        nodeText.appendChild(nodeSpanish)

    newXML = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL).sub('>\g<1></', doc.toprettyxml('\t', '\n', 'ISO-8859-1')) #formats the code :p
    string = '' #used for whitespace removal
    for line in newXML.split('\n'): #removes weird whitespace problem
        if line.strip():
            string += line + '\n'
    text = open(filename, "w") #open the xml document
    text.write(string) #puts xml inside
    text.close() #closes the xml document


this disables active translation so it will run faster, bear in mind that after the first run the program runs faster anyway as it isn't finding as many errors to replace! I will work on an option that will enable/disable active translation.
 
Not 21 files, 21 items, (so like 8 promotions, 3 builds etc)

also if i set it to just assets/xml it only detects like 7 files and none of them needed translations

and even when i set it to assets/xml/units it only detects 8 files, but it only does 4! im not sure why, i even set it so it does not check those 4, and it still checks those 4 and not the other four

also problems with the text is random. I just added a Baha'i religion, which the second A and the I has an accent. In the religion it looks great. When i make it a Baha'i missionary, the text shows up all funny it looks like bahA!i? and i have no idea why :lol:

When translating you definatly need to convert it, otherwise it really is not going to work. I mean i can do it myself, figuring out what is what and just find and replace, but other people that dont know how to do that, its going to mess up their translations
 
well, if you can give me a list of those special characters I can build in converters to the translator :p

definatly gonna have to make a debug for you... btw it detects any type of file present but only interacts with xml files... however it could be there are permission problems... I am gonna work on a debug version right now :D
 
well I could but if the problem is your end then I still won't know :p
 
Anyway you could just run my xml quickly now? Because im going out soon and I really want to update my mod soon, so I really could use it translated quickly. But after that I am gdefinatly going to keep using the xml editor so i can definatly keep posting feedback
 

Attachments

  • XML.rar
    3.6 MB · Views: 231
... I am working on that.. :p for some reason it is saying your building defines is not well formed... :confused:

also you might like to merge ALL of those text files together!!!! (plus the cleaner cannot detect folders within the Text folder...) over 250! my computer is struggling to even open the folder (gonna merge them myself.) having some sort of problem which I wasn't having with my own files and I will attempt to get your files working by the end of the night (or tommorrow night)... it is gonna be a loooooooooong night
 
right this I don't like opening your text files is because of this:

ExpatError: not well-formed (invalid token): line 40, column 16

and that happens to be:

<Text>Ninevêh</Text>

yup, those special characters need to be used for atleast some of the accented letters. é and others simmilar work but not these ones...

do you have that list? I can't continue without those special characters...

I remember having this problem before with the xml... when I was writing with french verb dictionary I found the xml parser refused to accept some accents... in response to this I had to make special characters in the xml for the program to decode ie é was e(>) and è was e(`)

this means the xml MUST follow the special character conventions... and that will take time as the program can't sort this out, it can't parse the xml files it can only enter the characters correctly... this means the user has to make sure the XML all has special characters... I am going to have to experiment around with C++ to see if it can just go it and do some replacing (or if there is another way...)



in any case I don't yet understand why the cleaner isn't working properly for you. it might be because of this or because of permissions... I am going to work very hard to get this fixed and working asap!!!
 
I got them :p

turns out they are the html codes for special characters... now I am going to have to write a program that will fix the text files :p be it python (which I think is possible I just checked) or faster C++ (which may not be possible, I will ask my C++ programming friend tommorrow)

once the text files are fixed it should be nice and easy for the cleaner to do it's cleaning :D. thank god you helped me find this bug!
 
this simple program will convert all the characters :p very fast too :p

Code:
# -*- coding: cp1252 -*-
fileread = open("Assyria_CIV4GameText.xml", "r")
string = fileread.read()
fileread.close()

accentMap = {"À" : "À", "Á" : "Á", "Â" : "Â", "Ã" : "Ã", "Ä" : "Ä", "Å" : "Å", "Æ" : "Æ", "Ç" : "Ç", "È" : "È",\
             "É" : "É", "Ê" : "Ê", "Ë" : "Ë", "Ì" : "Ì", "Í" : "Í", "Î" : "Î", "Ï" : "Ï", "Ð" : "Ð", "Ñ" : "Ñ",\
             "Ò" : "Ò", "Ó" : "Ó", "Ô" : "Ô", "Õ" : "Õ", "Ö" : "Ö", "×" : "×", "Ø" : "Ø", "Ù" : "Ù", "Ú" : "Ú",\
             "Û" : "Û", "Ü" : "Ü", "Ý" : "Ý", "Þ" : "Þ", "ß" : "ß", "à" : "à", "á" : "á", "â" : "â", "ã" : "ã",\
             "ä" : "ä", "å" : "å", "æ" : "æ", "ç" : "ç", "è" : "è", "é" : "é", "ê" : "ê", "ë" : "ë", "ì" : "ì",\
             "í" : "í", "î" : "î", "ï" : "ï", "ð" : "ð", "ñ" : "ñ", "ò" : "ò", "ó" : "ó", "ô" : "ô", "õ" : "õ",\
             "ö" : "ö", "ø" : "ø", "ù" : "ù", "ú" : "ú", "û" : "û", "ü" : "ü", "ý" : "ý", "þ" : "þ", "ÿ" : "ÿ"}

nstring = ""
for char in string:
    if char in accentMap.keys():
        char = accentMap[char]
    nstring = nstring + char

filewrite = open("Assyria_CIV4GameText.xml", "w")
filewrite.write(nstring)
filewrite.close()

now to implement this into my cleaner somehow...

edit: of course the second character in each of those pairs is the html notation but you can't see it :p

btw might I suggest that the name of the buildings art file is changed to not have an á in it: Palácio do Planalto
 
see my edit lol, I am going to have to get the program to pass over every file in the XML folder anyway but this means the program will change the Palácio do Planalto's define to include the html code instead messing up your whole art system. either it must be ignored in the options or name it Palacio do Planalto
 
Top Bottom