Reading xml via python: standard xml module is deprecated

The_J

Say No 2 Net Validations
Administrator
Supporter
Joined
Oct 22, 2008
Messages
39,570
Location
DE/NL/FR
I need again some help :(.

One of my modcomps will become so big, that the customizable function definition would become just to big (atm 9 arguments, and that's not the end), so i thought, i should make it customizable by xml.
I've asked one of my fellow students, how to do it, and he said, that there's a standard xml library (name: xml), and this could be used. I've looked a bit in the web, and a book from O Reilly said, that it's minimum in python since 2.0.
The suprise: This library is not included in civ.

Then i've looked at the standard civ python libraries, and there's xmllib availabe, and there's a description available online, which says, that this library has been deprecated in 2.0.
Okay, that doesn't matter, but when i import that library in the CvEventManager, civ will load, but it throws an error, which says, that this library has been deprecated. Yes...great...:rolleyes:.


So my question here: Does anybody else see an easy way, to read the XML files directly, and not just as plain text?


I hope, somebody has an idea and can help me :).
 
That message is just a warning, but the library works fine. I use it for BUG's config files. Be forewarned, it only does SAX parsing--not DOM. It has worked great so far, and I even have some code in CvAppInterface IIRC that imports the module with the file logging disabled to avoid having that error popup for the user.

BugConfig is the module in BUG that parses the config files, but several modules define their own TagHandlers to make it more extensible. However, you can do some pretty simple parsing without getting too complicated. See that module for the parsing part. Feel free to ask some questions here if you get stuck.
 
Thanks for the tips :).
I'll look at the files, and see, if they help me :).

How do you disable the logging partially?
The main thing, which concerned me, was, that this error at the start will annoy nearly everyone.


:think: i wonder, if just dropping the xml library from my python into the right folder will work. I've never worked with a sax parser, but i guess, that it's more complicated than working with a dom parser.
 
The other XML libraries I found all required a C runtime library, and I didn't want to even try to install that into Civ. I suspect it would have to go into the game's core Python libraries--not in the Python folder but in the System folder of Civ4. If you found a Python-only library, you should be able to drop it into your mod no problem.

SAX works by having a callback for the start and end of each tag. On the start you get the attributes as a dictionary. You also get a separate callback for the plain text between XML elements. It's not too hard, but it requires a little more work because you need to maintain the state yourself rather than getting the whole document as a tree.

Here's the code from BugConfig that blocks the error. It goes at the top of the module.

Code:
# block error alert about xmllib deprecation
try:
	import sys
	stderr = sys.stderr
	sys.stderr = sys.stdout
	import xmllib
finally:
	sys.stderr = stderr
 
Thanks for the help :).
Your cod works :).

I've now added the basic functions for opening a file and reading a file, but not for getting the values.

I've looked at the file from the BUG mod and this here (only source i can find), and i'm not sure, if i understand that right.

Example:
PHP:
<Person>
    <LastName>Doe</LastName>
    <FirstName>John</FirstName>
</Person>

To get the name, i would have to add a function to the parser, like:
PHP:
    def start_Person(self, attrs):
        MyLastNameVariable = attrs.get("LastName")
        MyFirstNameVariable = attrs.get("FirstName")

That looks definetly false to me :D :blush:.
 
In XML there are elements and attributes.

Code:
<element attribute="value">

This is a start tag for an element named "element" with a single attribute named "attribute" with value "value". You could modify your XML structure like this:

Code:
<Person firstName="John" lastName="Doe"/>

Attribute values are constrained somewhat. They cannot contain other XML, so you couldn't make part of a person's name be in italics as you could with your posted structure.

Using attributes makes dealing with the SAX parser easier because you get the element and attribute values all in one function call (start_element), allowing you to create your object(s) right away.

Everything between elements ("John" and "Doe" in your example plus all the whitespace) comes through the handle_data() function in chunks (not guaranteed to be one chunk). This makes dealing with your structure harder but still doable. This is why I went with attributes in BUG.
 
:lol::wallbash:.
In my Bachelor thesis i had to deal with several xml files, and there was one, which was generated by another programm. All the information there was stored in attributes, and i thought, that somebody must have missed somehow the sense of xml, but now i know, where that came from :lol::wallbash:.

:think: i wanted to use XML, because all modders here know, how they have to look, and i guess, this could confuse some, so using the attributes is the last option.

Do you know, from where handle_data() is called, what variables it takes, etc?
The module description is not very good :dunno:, can't read anything out of it.
 
I guess it all comes down to how you think of your objects. Take the <p> tag for example. A paragraph has contents: the text and character formatting that make it up. It also has attributes: is it centered or left-justified? How much space before and after it is there?

Perhaps the xmllib reference would help.
 
Thanks for the help again :).

I guess it all comes down to how you think of your objects. Take the <p> tag for example. A paragraph has contents: the text and character formatting that make it up. It also has attributes: is it centered or left-justified? How much space before and after it is there?

Is it really that complicated with the SAX parser?
Can't imagine, why somebody programmed this library.

Perhaps the xmllib reference would help.

handle_data( data)
This method is called to process arbitrary data. It is intended to be overridden by a derived class; the base class implementation does nothing.

A great.
When i have to write my own function to get the data, then i can directly work without the SAX parser.

-> :mad: d*** sh**, i'll treat it as a textfile.


Okay...i know, string processing is one of the things, which is python very good for, but haven't really worked with it.
Is there a support for patterns/wildcards/regular expressions, something of this type?
I think of something like
PHP:
mystring = "<"+[a-z,A-Z]*+">"
GivenString = ReadLine()
if mystring is in GivenString:
    delete(GivenString,mystring)
 
At some point, you may come back to the idea of just writing the data you want directly in python and not introducing another filetype. Some modders who can edit XML may be scared to edit python, but this number is small, and you can assist this with a couple of lines of instruction.

Python has a regular expression package called "re". You can google it. I do not have any examples with me right now, but I have used this in the past within civ. You add a line like "import re" and then you can use commands like "re.compile". You do not have to add any external dll/obj files or anything else, just the import command.
 
At some point, you may come back to the idea of just writing the data you want directly in python and not introducing another filetype. Some modders who can edit XML may be scared to edit python, but this number is small, and you can assist this with a couple of lines of instruction.

Yes, you're right, but i get the problem, that the function call doesn't fit on my monitor because of the number of variables, so i really want another solution.
And i take this programming part as "getting some more experience".

Python has a regular expression package called "re". You can google it. I do not have any examples with me right now, but I have used this in the past within civ. You add a line like "import re" and then you can use commands like "re.compile". You do not have to add any external dll/obj files or anything else, just the import command.

:goodjob: thanks.
I've found the documentation, this will be helpful.
 
Once you start messing with parsing the file yourself, you are pretty much re-inventing what xmllib already does...

The main difference is that you will already know how your stuff works (and you can implement only what you need, rather than a complete parser), but you have to figure out how to use xmllib.
 
I use the re module in BUG in a few places if you need further examples, but it really is easy to use.

You can break function calls onto multiple lines in Python. Anything in between ()s, []s, or {}s can be broken across multiple lines. And you can almost always add () to do so.

Code:
if (something
or something else
and this last thing):

As for why there is a SAX parser: large files and streaming files. The SAX parser is designed to be processed in chunks. The DOM parser requires that the entire XML document be read into memory to build the DOM. They each have advantages and disadvantages; we're just stuck because there is no DOM parser in Civ4.
 
Top Bottom