View Full Version : Question : reading values from XML files using Python


12monkeys
Mar 13, 2006, 08:39 AM
I'm quite sure this question has been asked several times now, but I failed to find something using forum search.

Is it possible to read out values from an XML file via python? I want to read out the some values GlobalDefines.XML, but cant find anything in the API. Maybe there is a pythoncomp doing that?

Does anybody know?

thx in advance.

Belizan
Mar 13, 2006, 03:26 PM
I asked about this a while ago, and eventually went to look at Python's XML libraries myself, but I haven't even had time to go back and do anything more with it beyond finding them and installing them :(. If you get around to it before I do, I'd love to hear what you find out. What I recall is, you'll need to grab sig, I think it's called, and you'll probably need to add its support files to the system python directory tree (in the assets/python folder--doesn't really matter where exactly, but that's where other libs of that type are). Civ4 runs Python 2.4, so get the lib for 2.4. I thought I had a link to where I picked up the package I installed, but I don't seem to. I have this link http://www.python.org/community/sigs/current/xml-sig/ but I'm pretty sure when I was actually looking at this I decided to use a different package. Like the other one they reference maybe. I'm not finding it right this second. I'll look again later if I have time.

12monkeys
Mar 13, 2006, 04:29 PM
I believe you mean the XML.SAX package. Before started this thread here, I also had a very draft look at PYTHON.ORG and was a bit surprised about what I found there. The SAX implementation is a bit too complex for a quick throw. What I was looking for is something simple like a INI file reader (file, section, value, return value) just for XML files, but what I saw at PYTHON.ORG is way more than that. I have no experience with XML parsing, so I started to ask around here first. However, I just need to read out 4 values from GlobalDefines.XML, so I created some constants for it now. I don't think that I will start to write a Civ4 Python XML parser now, maybe in some weeks when I'm tired of modding Civ4 interfaces ;).
Thx for response, anyway.

Belizan
Mar 13, 2006, 07:21 PM
If you only want four specific values, take a look for at their regex engine. I've only learned Pythin with respects to Civ4, so I have no depth of knowledge, but most scripting languages have a regex engine, and any such engine should be able to find you specific values by pattern quite easily.

Ok, you shamed me into it 8). A little bit of quick poking around the Python site (noticed some other cool stuff-why haven't I ever looked here before?!?), the lib you want is re. As in import re...

So... If you wanted to find the value of
DefineName>LAND_TERRAIN</DefineName>
<DefineTextVal>TERRAIN_PLAINS</DefineTextVal>

You'd do something like (I'm just typing this into this window, I've not tested at all)


import re

pPat = re.compile("<DefineName>LAND_TERRAIN</DefineName>\s<DefineTextVal>([^<]*)</DefinteTextVal>", re.IGNORECASE)
pMatch = pPat.match(sFileAsStringPreviouslyLoadedByCodeNotH ere)
sVal = pMatch.group(1) #This is the value portion of the define


There's a howto on it here...
http://www.amk.ca/python/howto/regex/
and the module description is here...
http://docs.python.org/lib/module-re.html

SimCutie
Mar 13, 2006, 08:02 PM
This is a improved version of XML paser I have used in my Civilopedia MOD

http://forums.civfanatics.com/showthread.php?t=161149

Written as simple as possible, you can modify it easily to suit your need.


############################ XMLItems ###################################

import warnings
warnings.filterwarnings("ignore", "The xmllib module is obsolete.", DeprecationWarning, "xmllib")
# Yes, but the new xml.sax is not included. So I have to use it.

from xmllib import XMLParser, Error as xmllibError

class XMLItems(XMLParser):

# def handle_item(self, item_id, item_title, item_text, item_extra ):
# pass
Error = xmllibError

def __init__(self, handle_item):
XMLParser.__init__(self)
self.data = self.tag = ""
self.handle_item = handle_item

def unknown_starttag(self, tag, attrs):
if tag in ( "ITEMS", "ITEM", "ID", "TITLE", "EXTRA", "TEXT" ) :
self.tag = tag
if tag == "ITEM" :
self.reset_data()
elif tag == "TEXT" :
self.setliteral()

def handle_data(self, data):
if self.tag:
self.data += data.strip()

def unknown_endtag(self, tag):

if tag == "ID" :
self.item_id = self.data
elif tag == "TITLE" :
self.item_title = self.unescape(self.data)
elif tag == "EXTRA" :
self.item_extra = self.unescape(self.data)
elif tag == "TEXT" :
self.item_text = self.unescape(self.data)
elif tag == "ITEM" :
if self.item_id and self.item_title and self.item_text :
if self.item_extra :
extra = self.item_extra
else :
extra = self.extra_default
self.handle_item( self, self.item_id, self.item_title, self.item_text, extra )

self.reset_data()
elif tag == "ITEMS" :
pass
else:
pass
self.data = self.tag = ""

# def syntax_error(self, message):
# raise Error('Syntax error at line %d: %s' % (self.lineno, message))

def reset_data(self):
self.item_id = self.item_title = self.item_extra = self.item_text = ""

@staticmethod
def unescape(s):
# Civ4 post processing eat up '<' and '[', unescape '@{'and '@[' "
s = s.replace("@{", " <")
s = s.replace("@[", "[")
s = s.replace("@_", " ")
return s

Conquestador
Mar 14, 2006, 09:19 AM
I was just approaching the same problem but i'm quite far from a solution.

After looking at DiveintoPython tutorial at http://www.diveintopython.org
I was trying to use the xml.minidom library that is part of the standard python library and i've made this test

from xml.dom import minidom

xmldoc = minidom.parse('C:\CIV4GameSpeedInfo.Xml')
gamespeeds = xmldoc.getElementsByTagName('GameSpeedInfo')
for gamespeed in gamespeeds:
SpeedType = gamespeed.getElementsByTagName('Type')
TrainingSpeed =gamespeed.getElementsByTagName('iTrainPercent')
print SpeedType[0].toxml(), TrainingSpeed[0].toxml()


The result looks like this:

<Type>GAMESPEED_MARATHON</Type> <iTrainPercent>200</iTrainPercent>
<Type>GAMESPEED_EPIC</Type> <iTrainPercent>125</iTrainPercent>
<Type>GAMESPEED_NORMAL</Type> <iTrainPercent>100</iTrainPercent>
<Type>GAMESPEED_QUICK</Type> <iTrainPercent>80</iTrainPercent>

I'm still guessing how to get only the actual data without tags, somebody can help ?