Accents and other Unicode

j_mie6

Deity
Joined
Dec 20, 2009
Messages
2,963
Location
Bristol (uni)/Swindon (home)
Hello,

I have been doing some work on my mod (naming city sites, a system simmilar to ryhes) and I found myself using a few accents. After the first couple of hitches in the file itself (and resorting to XML to define the accented city names) it works fine.

However when ever the sys.stdout.write() method is called from PyPrint (ie on any event in the EventManager) refering to those cities the same exceptions appear about it than happened when I tried using raw unicode in the python files:

Code:
Traceback (most recent call last):
  File "CvEventInterface", line 23, in onEvent
  File "CvEventManager", line 188, in handleEvent
  File "CvEventManager", line 918, in onCultureExpansion
  File "CvUtil", line 122, in pyPrint
  File "<string>", line 13, in write
UnicodeEncodeError
: 
'ascii' codec can't encode character u'\xf6' in position 9: ordinal not in range(128)

ERR: Python function onEvent failed, module CvEventInterface

does anybody know how I can change PyPrint to work correctly work with the accents?

in the XML I am using for example:

Code:
	<TEXT>
		<Tag>CITY_MEXICO_CITY</Tag>
		<English>Ciudad de México</English>
		<French>Ciudad de México</French>
		<German>Ciudad de México</German>
		<Italian>Ciudad de México</Italian>
		<Spanish>Ciudad de México</Spanish>
	</TEXT>
(where the é is actually the html Code &\#233; without the \ of course)

which works fine with in game messages and stuff just not the debug logs

Thanks,

Jamie
 
Though I didn't use PyPrint, I came across similar problems when writing this component. Whenever you output to the console you need to first encode any unicode strings (into UTF-8 or similar). I used the following function:

Code:
def getOutputEncoding(string):
	'Encodes string correctly for console output'
	if type(string) == str:
		return string
	else:
		return string.encode('utf-8')

Trying to encode a non-unicode string will throw an error, thus the test to check if a string is a standard 'str' (i.e 7-bit ASCII). If you're also loading strings with non standard ASCII characters from a file that isn't read by Civ4's XML parser you also need to decode them (into unicode). For example:

Code:
mystring = getFancyString(line).decode('utf-8')

Where 'getFancyString' is whatever function you use to read the string and line is a line of your file obtained using file.readlines(). Make sure your text file is actually saved as UTF-8, if you use a different encoding be sure to change the above accordingly. Basically, whenever using unicode strings in Python the rule is "Decode in, Encode out".
 
Why not just change the accents?
The tool which you wrote yourself should take care of them, right ^^?

Unfortunately the tool I wrote turns them into what I have, and that fixed the problem that stopped the names being used altogether. Using my XML tags I can pass it to C++ without hitch.

As for Xyth, all I have to do is ammend the PyPrint with that code built into it? Sounds easy enough :D thanks! Unicode really does my head in sometimes though :rolleyes:
 
Back
Top Bottom