XML text cleaner

well yes but then the program would complain. that accent is still in the xml and the program can't load the file. the only way around it is to ignore the file or to remove that accent.

still working on fixing all this stuff btw :p
 
no thats not the point lol. the your artdefines building the Palacio's actual path name contains an accent. the program will correct this and your art is now in the wrong place.

the stuff regarding text is all fine, no harm anywhere, but paths need to be accent free or set to ignore.

and after noting that if the program can load a file it doesn't fix it I realised some files with acceptable accents will be left. so I make it do all of them and as it checks each character individually it takes a loooooooooong time. guess I will leave acceptable accents in :p this setup also means that the program will only run slow the first time it's used, as after accents that confict should not appear in the code :D
 
this simple program will convert all the characters :p very fast too :p

Code:
# -*- coding: cp1252 -*-
fileread = open("Assyria_CIV4GameText.xml", "r")
string = fileread.read()
fileread.close()

accentMap = {"À" : "À", "Á" : "Á", "Â" : "Â", "Ã" : "Ã", "Ä" : "Ä", "Å" : "Å", "Æ" : "Æ", "Ç" : "Ç", "È" : "È",\
             "É" : "É", "Ê" : "Ê", "Ë" : "Ë", "Ì" : "Ì", "Í" : "Í", "Î" : "Î", "Ï" : "Ï", "Ð" : "Ð", "Ñ" : "Ñ",\
             "Ò" : "Ò", "Ó" : "Ó", "Ô" : "Ô", "Õ" : "Õ", "Ö" : "Ö", "×" : "×", "Ø" : "Ø", "Ù" : "Ù", "Ú" : "Ú",\
             "Û" : "Û", "Ü" : "Ü", "Ý" : "Ý", "Þ" : "Þ", "ß" : "ß", "à" : "à", "á" : "á", "â" : "â", "ã" : "ã",\
             "ä" : "ä", "å" : "å", "æ" : "æ", "ç" : "ç", "è" : "è", "é" : "é", "ê" : "ê", "ë" : "ë", "ì" : "ì",\
             "í" : "í", "î" : "î", "ï" : "ï", "ð" : "ð", "ñ" : "ñ", "ò" : "ò", "ó" : "ó", "ô" : "ô", "õ" : "õ",\
             "ö" : "ö", "ø" : "ø", "ù" : "ù", "ú" : "ú", "û" : "û", "ü" : "ü", "ý" : "ý", "þ" : "þ", "ÿ" : "ÿ"}

nstring = ""
for char in string:
    if char in accentMap.keys():
        char = accentMap[char]
    nstring = nstring + char

filewrite = open("Assyria_CIV4GameText.xml", "w")
filewrite.write(nstring)
filewrite.close()

now to implement this into my cleaner somehow...

edit: of course the second character in each of those pairs is the html notation but you can't see it :p

btw might I suggest that the name of the buildings art file is changed to not have an á in it: Palácio do Planalto

It's easier to convert these characters into their HTML equivalents ^^.
PHP:
def htmlconvert(string):
        newString =""
        for char in string:
                if ord(char)>127:
                        newString = newString+'&#'+str(ord(char))+';'
                else:
                        newString = newString+char
        return newString
 
whats the advantages of this method over mine? I guessing it doesn't need a coding declaration?
 
Ohh i didnt know it was in my art defines, yeah that can be removed no problem, just need to change it in the buildings art defines as well

EDIT: also by the looks of it, Js way looks shorter, and you definatly want to make it as fast as possible, but idk python so ican be wrong
 
I am still looking to convert it to C++ though for even faster speed. as it does take a long time when going through 355 massive files :p, plus the J's code is shorter as it only deals with the string and not the files themselves, but that accounts for only 6 or 7 lines more...
 
Right, right. In my full code, the reading + writing of the file is handled elsewhere.

whats the advantages of this method over mine? I guessing it doesn't need a coding declaration?

Think it still needs one.
The advantage is that you don't need a dictionary, and will catch even stuff which you never thought about. No need to care about what characters are in there, you can't even forget one, the code will deal with it, no matter what.
 
hmmm ok, though apart from ÷ I have all off the html special chars in the dictionary I think
 
ok, in the interest of speed, if the program says a file is incapatable you then click the button on the GUI which fixes all files to use the html codes. however the way this will hopefully be faster is the code run is something different :p

Code:
os.system("C:\\Users\\User\\Documents\\Programming\\C++\\fixer.exe")

this runs a C++ application that will fix the files way way faster. just gotta wait till my friend gets back from climbing to ask him how I would do the script in C++

of course the Translator will automatically run The_J's code when it is translating mods text :p
 
ok so I have C++ code that replaces the accents in a given string with the html code... (can't work out to use a method simmilar to The_J's yet to cast the character without the map)

just need to work on opening the files etc.

Spoiler :

Code:
#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <sstream>
using namespace std;

/*
   Made by J_mie6 with help from The_J
   This program is designed to be run via the XML cleaner to fix incompatible files;
   Originally made in python this program ran very slowly and hopefully this C++ version
   will drastically increase the "fixing" of these files containing characters that the XML
   parser cannot handle (thus stopping the program from working properly!)
*/

map<const string, int> values;

void setValues()
{
     values["À"] = 192; values["Á"] = 193; values["Â"] = 194; values["Ã"] = 195; values["Ä"] = 196; values["Å"] = 197; values["Æ"] = 198; values["Ç"] = 199; values["È"] = 200;
     values["É"] = 201; values["Ê"] = 202; values["Ë"] = 203; values["Ì"] = 204; values["Í"] = 205; values["Î"] = 206; values["Ï"] = 207; values["Ð"] = 208; values["Ñ"] = 209;
     values["Ò"] = 210; values["Ó"] = 211; values["Ô"] = 212; values["Õ"] = 213; values["Ö"] = 214; values["×"] = 215; values["Ø"] = 216; values["Ù"] = 217; values["Ú"] = 218;
     values["Û"] = 219; values["Ü"] = 220; values["Ý"] = 221; values["Þ"] = 222; values["ß"] = 223; values["à"] = 224; values["á"] = 225; values["â"] = 226; values["ã"] = 227;
     values["ä"] = 228; values["å"] = 229; values["æ"] = 230; values["ç"] = 231; values["è"] = 232; values["é"] = 233; values["ê"] = 234; values["ë"] = 235; values["ì"] = 236;
     values["í"] = 237; values["î"] = 238; values["ï"] = 239; values["ð"] = 240; values["ñ"] = 241; values["ò"] = 242; values["ó"] = 243; values["ô"] = 244; values["õ"] = 245;
     values["ö"] = 246; values["ø"] = 248; values["ù"] = 249; values["ú"] = 250; values["û"] = 251; values["ü"] = 252; values["ý"] = 253; values["þ"] = 254; values["ÿ"] = 255;
}

string convertStringToInt(string value)
{
       stringstream ss;
       ss << values[value];
       return ss.str();
}

string getHtmlValue(string character)
{
       return "&#" + convertStringToInt(character) + ";";
}

int main ()
{

setValues();

string base = "héllo";
string html = getHtmlValue("é");

string str = base;
str.replace(1, 1, html);

cout<< str<< endl;
cin.get();
return 0;
}
 
update:

the C++ program is finsihed aside from getting the input from Python (which is just a big string of filenames that will be checked. C++ can't find them itself and I am reluctant to write the names into a file made by python to then be read by C++, I'd rather python 'piped' them in. however can't work out how C++ receives the pipe itself yet.

Spoiler :

Code:
#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <vector>
#include <sstream>
#include <stdio.h>
#include <stdlib.h>
using namespace std;

/*
   Made by J_mie6 with help from The_J
   This program is designed to be run via the XML cleaner to fix incompatible files;
   Originally made in python this program ran very slowly and hopefully this C++ version
   will drastically increase the "fixing" of these files containing characters that the XML
   parser cannot handle (thus stopping the program from working properly!)
*/

map<const char, int> values;

void setValues()
{
     values['À'] = 192; values['Á'] = 193; values['Â'] = 194; values['Ã'] = 195; values['Ä'] = 196; values['Å'] = 197; values['Æ'] = 198; values['Ç'] = 199; values['È'] = 200;
     values['É'] = 201; values['Ê'] = 202; values['Ë'] = 203; values['Ì'] = 204; values['Í'] = 205; values['Î'] = 206; values['Ï'] = 207; values['Ð'] = 208; values['Ñ'] = 209;
     values['Ò'] = 210; values['Ó'] = 211; values['Ô'] = 212; values['Õ'] = 213; values['Ö'] = 214; values['×'] = 215; values['Ø'] = 216; values['Ù'] = 217; values['Ú'] = 218;
     values['Û'] = 219; values['Ü'] = 220; values['Ý'] = 221; values['Þ'] = 222; values['ß'] = 223; values['à'] = 224; values['á'] = 225; values['â'] = 226; values['ã'] = 227;
     values['ä'] = 228; values['å'] = 229; values['æ'] = 230; values['ç'] = 231; values['è'] = 232; values['é'] = 233; values['ê'] = 234; values['ë'] = 235; values['ì'] = 236;
     values['í'] = 237; values['î'] = 238; values['ï'] = 239; values['ð'] = 240; values['ñ'] = 241; values['ò'] = 242; values['ó'] = 243; values['ô'] = 244; values['õ'] = 245;
     values['ö'] = 246; values['ø'] = 248; values['ù'] = 249; values['ú'] = 250; values['û'] = 251; values['ü'] = 252; values['ý'] = 253; values['þ'] = 254; values['ÿ'] = 255;
     values['¡'] = 161; values['¿'] = 191; values['÷'] = 247; values['&#338;'] = 338; values['&#339;'] = 339; values['&#352;'] = 352; values['&#353;'] = 353; values['&#376;'] = 376; values['&#402;'] = 402;
}

string convertCharToInt(char value)
{
       stringstream ss;
       ss << values[value];
       return ss.str();
}

string getHtmlValue(char character)
{
       return "&#" + convertCharToInt(character) + ";";
}

bool isCharSpecial(char c)
{
     return values.count(c)>0;
}

string getTextFromFile(const char* filename)
{
    vector<string> text;
    string line;
    ifstream textstream (filename);
    while (getline(textstream, line)) {
          text.push_back(line + "\n");
    } 
    textstream.close();
    string alltext;
    for (int i=0; i < text.size(); i++){
        alltext += text[i];
    }
    return alltext;
}

void writeTextToFile(const char* filename, string data)
{
     ofstream file;
     file.open (filename);
     file << data;
     file.close();

}

vector <string> splitInput(string data, char separator)
{
     istringstream ss( data );
     vector <string> vData;
     while (!ss.eof())
     {
           string x;
           getline( ss, x, separator );
           vData.push_back(x);
     }  
    
     return vData;
}     

int main (int argc, char *argv[])
{
    string files;
    files = argv[1];
    
    vector <string> vFiles = splitInput(files, ',');
    setValues();
    const char* filename;
    
    for (int x=0; x<vFiles.size()-1; x++)
    {
        filename = vFiles[x].c_str();
        cout << "Correcting: "<< filename << endl;
        string str = getTextFromFile(filename);
        string html;
    
        for(int i=0; i<str.length(); i++)
        {
            if (isCharSpecial(str[i]))
            {
                html = getHtmlValue(str[i]);
                str.replace(i, 1, html);
            }    
        }

       writeTextToFile(filename, str);
    
    }
    cin.get();
    return 0;
}


then after this is finished I hook it up to the GUI as well as The_J's script and try and get your files working...

edit: well we got the arguements to send over, however windows fails to send all of your file names over as it exceeds size restriction!!! (and that is being send as a string :p) this is why you gotta merge them :p still after testing the C++ version works much faster and that is with C++'s slow printing telling you what file it is checking. edit: in fact it runs all the files (bar the 280+ text files) in less than a minute :D

would you mind merging all your text files into like 10 or so and sending them back to me so I can continue?
 
It doesn't actualy, that makes sense, I will implement this when I have got the next version!
 
Top Bottom