XML Autocorrect Tool

billw2015

King
Joined
Jun 22, 2015
Messages
837
This tool will do semi-automated grammar and spelling correction on Text XML files.
Usage:
  1. Have java installed (most people probably do?)
  2. Run Tools\Autocorrect\StartServer.bat
    Spoiler :

    This starts the grammar and spelling server that the tool uses, a local version of the LanguageTool service. However I only use this for grammar and detecting initial spelling errors, its dictionary is way too limited for our eclectic needs. I added a couple of other spell checkers on top of this to give better accuracy, however the default is set to just use google search for spell checking.
  3. Drag and drop Text XML files on to Tools\Autocorrect\DropXMLHere.bat
    You can also run Autocorrect.bat from the command line and see the available options.
The server is quite slow (I does about one piece of text per second or so) (EDIT: not true anymore, it is very fast in some cases, and caches results), but it picks up a lot of things beyond spelling, including incorrect unit conversions, and some bad writing style (!).

Using google as the spell check means it shouldn't hit many false positives for spelling, but it will miss some things like incorrect capitalization etc. Basically if you can type it in google and google doesn't say "did you mean xyz?" it won't be picked up as an error in this system by default.

The interface is just simple keyboard + console, with some color added to make it easier to understand. When the tool finds errors in a TEXT entry it will present all the corrections applied and allow you to choose to either apply them all, or interactively select which ones you want to use. Of course you can skip, ignore and exit at the appropriate places as well.

Spoiler Screenshot :

upload_2019-9-21_22-45-4.png

 
Last edited:
This didn't work:
I ran Start Server bat, let it do stuff, and then I dragged and dropped text xml on Dropzmlhere bat, and got error.
Start server bat failed to install stuff properly - it says path not found.
Spoiler :

Dwm 2019-09-22 09-57-39-32.png
Dwm 2019-09-22 09-57-36-33.png


Python folder is pythonx86 and not python.
It seems like python folder wasn't renamed.
So force rename pythonx86 to python in Tools folder.
Apparently that thing recreated that folder lol.

Now that thing works.

Best way to do it is: open two foders.
First one would display Autocorrect folder content so you can drop thing on bat file.
Second one can be opened in assets. Type *text*.xml in search bar to find all text files.
There are a lot of text files scattered in modules
 
Last edited:
Ah okay, I guess you have 32bit OS?? I will modify the batch file to do the rename then.
 
@billw2015 it crashes on larger files.

Dwm 2019-09-22 10-47-44-07.png


For example it can't process P2K_CIV4GameText fully - it has 96 KB, biggest text files are closer to 1 MB.
 
Hmm powershell thinks you have 32 bit OS so it installed the x86 version of python, hence the directory name. Weird
 
It shouldn't crash on too many requests error, instead it should retry request.
 
It shouldn't crash on too many requests error, instead it should retry request.
Software shouldn't crash you say? I think I'm going to need to run this one by the guys in the lab :mischief:
I have not seen this error. I guess we spammed google spell check too hard? Which file did you use?

edit: nm didn't see your eariler post. You just reiterating for emphasis?
 
You just reiterating for emphasis?
Yeah, because now I don't know if it finished checking file or was abruptly shut down.

Also if I wanted to recheck file again it didn't even check file as it just immediately said "too many requests"
Any moderately large (>100 kB) text file would eventually result in this error.

Also we have plenty of text files.
Spoiler :

Dwm 2019-09-22 12-09-16-54.png


Over 48 000 english text entries
 
Last edited:
Yeah but it only uses google when the other spell check detects a spelling error, so it doesn't make 48000 requests to google. Regardless I will try it out today and see if there is a fix.
 
Okay new version is in. It is a LOT faster now, like 50x faster (I found a bug in the library I was using, and rewrote it). It catches errors with google, but it also calls google a lot less now anyway.
And it has nice progress dots that show some indication of what it is doing.
I fixed up that file you listed above in about 10 seconds using it.

edit: actually it looks like it misses some spelling errors in its current form, I will need to tweak it again a bit.
 
Last edited:
Okay last change for now is in, improving the spell check performance, and stopping it from missing obvious spelling mistakes.
 
@billw2015 I can't install it anymore - some red wall of text appears before exiting instantly, when I launch StartServer.bat.
As if something python related doesn't install fully.
Spoiler :

Dwm 2019-10-19 09-25-34-47.png
Dwm 2019-10-19 09-23-40-04.png
Dwm 2019-10-19 09-23-40-35.png



I tried deleting python folders in tools, but it didn't work.
 
I tried it again myself and it was still working, but I tweaked the script anyway to no longer support 32-bit Windows, as it keeps thinking yours is 32-bit, and installing pythonx86.

I can't read the error message fully, but it looks like problem renaming the python directory it unzipped to (pythonx86), like its locked or read only or something. Hopefully this change will fix it.
 
I tried it again myself and it was still working, but I tweaked the script anyway to no longer support 32-bit Windows, as it keeps thinking yours is 32-bit, and installing pythonx86.

I can't read the error message fully, but it looks like problem renaming the python directory it unzipped to (pythonx86), like its locked or read only or something. Hopefully this change will fix it.
It worked :D

Maybe folder rename was too early, and then something couldn't be installed?
Now it is forced to install in single folder, as it always installs 64bit version.
 
Last edited:
Top Bottom