Tutorial: CIV+IDA - Part 2

Ha! This definitely sounds like a hacking quest :) My guess is that it is feasible, but of course it needs investigation.
I have made some (superficial) investigations of my own. ;)
It turns out that function called here is responsible for drawing "next frame" of animation:
Code:
0988:1F00  03C2                add  ax,dx
0988:1F02  054000              add  ax,0040
0988:1F05  50                  push ax
0988:1F06  9A9600E51E          call 1EE5:0096
0988:1F0B  83C406              add  sp,0006
0988:1F0E  B80100              mov  ax,0001
0988:1F11  50                  push ax
[b]0988:1F12  9A34012403          call 0324:0134            <== draw[/b]
0988:1F17  83C402              add  sp,0002
0988:1F1A  8B848218            mov  ax,[si+1882]           ds:[1886]=0001
Unfortunately NOPing it gives pretty ugly results. Unlike fast scrolling on poles when trying to "go across"...
Ok, so from where I stand, there could be 4 reasons, or most likely a combination of those 4 reasons, why you cannot find what you want:
  1. You're using little-endian in IDA (which is wrong): when using hex search in IDA, you should always enter values as big endian, because IDA takes the initiative to convert it to little endian for you...
You have nailed it in first go... :D
  1. Your lookup bytes includes a relocated value: not sure how familiar you are with 16-bit segment addressing and MZ EXE format, but you should know that it has a features called relocations, that makes it possible for MS-DOS to load the EXE anywhere in memory, and still use absolute addressing without breaking the code:
    • Basically, 16-bit segmented code uses segments and offsets to address memory and other parts of the code: offsets are always relative to a segment, so they are insensitive to the place where the EXE is loaded in memory; but segments are absolute: their value correspond to absolute positions in memory, at runtime.
As I just experienced, one should not put segment numbers in hex searches.
Code:
[b]b8 01 00 50 9a 34 01[/b] 82 01
- these are hex codes in .exe file of assembly code from 0988:1F0E on. The segment number is different, just like you said.
I must say I don't like this segment addressing - apart from problems you presented, it is not unique. If I have some linear address in Cheat engine, I cannot go and toggle breakpoint in say 327a5 address, because it can be 327a:0005 or 3270:00a5 or many others.
in my experience, DOSBox always loads CIV_UNP.EXE at segment 1A2h
I think you are right - I recognize this 2BBD segment. :)
  • Your lookup code is in an overlay: even if you managed relocations and endianness, you may still not find your lookup sequence if it is part of an overlay. Indeed, overlays are pieces of code, stored as completely separate MZ executables, but awkwardly located within CIV.EXE itself - just make a seach for the 'MZ' magic bytes in CIV.EXE with a hex editor, and you'll see what I mean:
    • Meant to reduce the memory footprint of the program (at the expense of loading code at runtime), those pieces of code are dynamically loaded, at runtime, when necessary, using an overlay interrupt (int 3Fh)
    • In the main CIV.EXE program, a segment is dedicated to loading overlays (seg020), and everytime a new overlay is loaded, it completely overwrites the segment contents, discarding the overlay that was loaded before.
    • It so happens that a good portion of CIV code lies within overlays (map generation, save/load, most reports, some diplomacy, some AI, etc.)

[*]IDA v5.0 (and presumably earlier versions too) does not recognize overlays, and as such they are not present in your IDA database by default. There is a way to have them all in IDA, and transform overlay interrupts into usual far calls, but I'll keep the details for the next tutorial installment ;)
I think no IDA version would recognize overlays because (I think) it is not standardized thing (like every app which uses overlays does this the same way). I lately have been playing Warlords 2 and it seems that it loads new code all the time - I debugged it and then when later (like two tiles of movement later :) ) I went for the same address it contained very different code. It's kinda beautiful (seeing the lengths programmers went to overcome limits layed by "old" hardware) but also pain in the...
[*]Last but not least, your lookup code is in a driver: by driver I mean one the files MISC.EXE, *GRAPHIC.EXE or *SOUND.CVL. All those files are standalone MZ EXE files which are also loaded dynamically, at runtime, by CIV.EXE, upon choosing the preferred Video and Audio settings:
I won't be meddling with drivers anytime soon. ;)
 
I have made some (superficial) investigations of my own. ;)
It turns out that function called here is responsible for drawing "next frame" of animation:
Code:
0988:1F00  03C2                add  ax,dx
0988:1F02  054000              add  ax,0040
0988:1F05  50                  push ax
0988:1F06  9A9600E51E          call 1EE5:0096
0988:1F0B  83C406              add  sp,0006
0988:1F0E  B80100              mov  ax,0001
0988:1F11  50                  push ax
[b]0988:1F12  9A34012403          call 0324:0134            <== draw[/b]
0988:1F17  83C402              add  sp,0002
0988:1F1A  8B848218            mov  ax,[si+1882]           ds:[1886]=0001

I have no clue where you're getting your segments from... What's your CIV version? is this a copy-paste from DOSBox, or else?

Unfortunately NOPing it gives pretty ugly results. Unlike fast scrolling on poles when trying to "go across"...

NOPing a "far call" always creates havoc, precisely because of the relocation issue: the binary syntax of "far call" is 5 bytes: 9A <offL> <offH> <segL> <segH>.

Since <segL> <segH> stand for a segment base, it is listed in the relocation table and updated at EXE load time, to account for the program start position in memory... So if you set it to NOP NOP (90 90), it will be changed to 9090 + <progStartBase>, which is no longer NOP NOP.

For me, the best way to shunt far calls is to "jump" the bytes: 9A <offL> <offH> <segL> <segH> -> EB 03 <offH><segL> <segH> (jumps right after <segH>)


If I have some linear address in Cheat engine, I cannot go and toggle breakpoint in say 327a5 address, because it can be 327a:0005 or 3270:00a5 or many others.

I am not sure why this is a problem for you: 3270:00a5 and 327a:0005 are actually representing the same physical location in memory, so you may as well put breakpoints on absolute positions, it should work too.

I think no IDA version would recognize overlays because (I think) it is not standardized thing (like every app which uses overlays does this the same way).

I know for a fact that a later version of IDA (not free) does detect overlays and automatically creates segments for them when first parsing the EXE. IDA does not patch the overlay calls though (INT 3Fh).

As a matter of fact, the concept of overlays is an MS-DOS standard, as reflected in the last word of the standard MZ header (bytes 26 and 27) which represents the EXE's "overlay number", as well as the interrupt 3Fh which is dedicated to overlay loading as per MS-DOS specs.

However, the implementation of how to manage overlays is not specified by MS-DOS (I presume), and typically Microsoft and Borland used to provide different implementations of overlay management. If you analyze CIV.EXE code from the start, one of the first thing done is "installing" the interrupt handler for INT 3Fh - that is to say, configure the system to plug-in INT 3Fh into the overlay management code embedded in CIV.EXE (although it was put in CIV.EXE by the MSC compiler, most likely, not by CIV programmers).

I definitely have to progress on PART 3, it seems :p
 
I have no clue where you're getting your segments from... What's your CIV version? is this a copy-paste from DOSBox, or else?
Yes, it is from DOSBOX; Civ version I think .04.
NOPing a "far call" always creates havoc, precisely because of the relocation issue: the binary syntax of "far call" is 5 bytes: 9A <offL> <offH> <segL> <segH>.
Since <segL> <segH> stand for a segment base, it is listed in the relocation table and updated at EXE load time, to account for the program start position in memory... So if you set it to NOP NOP (90 90), it will be changed to 9090 + <progStartBase>, which is no longer NOP NOP.
No I meant NOPing 5 bytes :). It didn't mess up anything just did some not pretty animations. I think what I would like to achieve is making it draw every 2nd, 3rd or maybe even 4th frame of animation. Now it just looks ugly. I should look somewhat deeper into it.. :)
Side remark/observation - in Warlords 2 I patched some code in a way I presumed would crash the game immediately. I needed more bytes to code mov instruction with immediate value and not register:
mov [addr],al => mov [addr], imVal
So I "swallowed" pop or push instruction before. I thought it would break everything right away (messing up stack should have disastrous and immediate results), but it didn't; I don't exactly know why.
I am not sure why this is a problem for you: 3270:00a5 and 327a:0005 are actually representing the same physical location in memory, so you may as well put breakpoints on absolute positions, it should work too.
Yes, BPLM command in DOSBOX gives proper results - breakpoint on write. But if my cheat engine points to some address in code it seems that I have to have correct (at runtime) number of CS and IP to set breakpoint properly. Although it is not a big problem because I do not search for code in Cheat Engine, only use it for "patching" code "on demand". But if I forget what the CS:IP address was, I cannot use any address which points to the same area in code - it must be the address.
I know for a fact that a later version of IDA (not free) does detect overlays and automatically creates segments for them when first parsing the EXE. IDA does not patch the overlay calls though (INT 3Fh).

As a matter of fact, the concept of overlays is an MS-DOS standard, as reflected in the last word of the standard MZ header (bytes 26 and 27) which represents the EXE's "overlay number", as well as the interrupt 3Fh which is dedicated to overlay loading as per MS-DOS specs.

However, the implementation of how to manage overlays is not specified by MS-DOS (I presume), and typically Microsoft and Borland used to provide different implementations of overlay management. If you analyze CIV.EXE code from the start, one of the first thing done is "installing" the interrupt handler for INT 3Fh - that is to say, configure the system to plug-in INT 3Fh into the overlay management code embedded in CIV.EXE (although it was put in CIV.EXE by the MSC compiler, most likely, not by CIV programmers).

I definitely have to progress on PART 3, it seems :p

Good to know. Where do you take your knowledge from? I must say I started my studies in the end of the nineties, so era of DOS and segment addressing were pretty much gone then. :)
One thing is really upsetting - I heard that even if one wanted to buy IDA (and spend 2000$ or so) they only sell it to trusted parties. In such cases I really strongly support piracy - it is not eating to their profits, since they are not selling it anyway, and I don't accept this infringement on my freedom to have tools. ;)
 
No I meant NOPing 5 bytes :). It didn't mess up anything just did some not pretty animations. I think what I would like to achieve is making it draw every 2nd, 3rd or maybe even 4th frame of animation. Now it just looks ugly. I should look somewhat deeper into it.. :)

Ok so did you NOP them in memory while CIV was running? Because even if you NOP the 5 bytes in the cold file CIV.EXE before running, for example you overwrite 9A 12 04 E8 1E with 90 90 90 90 90, the position of the last two bytes is still recorded in the relocation table... And when loading CIV.EXE, it would typically become 90 90 90 32 92 (9090 + 1A2 = 9232), which stands for "db 32; xchg ax, dx;"...

Anyway, if it works, good for you :)

Side remark/observation - in Warlords 2 I patched some code in a way I presumed would crash the game immediately. I needed more bytes to code mov instruction with immediate value and not register:
mov [addr],al => mov [addr], imVal
So I "swallowed" pop or push instruction before. I thought it would break everything right away (messing up stack should have disastrous and immediate results), but it didn't; I don't exactly know why.

Can't tell you why either, without more details :) At this level, every bit counts, and some changes may have little to no impact, while others will just break everything... Detailed analysis is usually inevitable.

Yes, BPLM command in DOSBOX gives proper results - breakpoint on write. But if my cheat engine points to some address in code it seems that I have to have correct (at runtime) number of CS and IP to set breakpoint properly. Although it is not a big problem because I do not search for code in Cheat Engine, only use it for "patching" code "on demand". But if I forget what the CS:IP address was, I cannot use any address which points to the same area in code - it must be the address.

Ok I see, I didn't use CheatEngine yet for CIV, so I didn't know its limitations.


Good to know. Where do you take your knowledge from? I must say I started my studies in the end of the nineties, so era of DOS and segment addressing were pretty much gone then. :)

In fact, the short story is "me + Internet research (+ thank you Gowron)"...

The long story (be prepared, cause it's going completely overboard) is that I have always been more or less interested in programming: I started to play Civ as a kid (following my older brother), and used to fiddle around with it using hex editors and QBasic, for fun. I studied computer science in the early 2000's, and went for an IT job after that, but a little more on the business side of things. Still I always kept a an interest for software/hardware technology.

So I had some of that background, in late 2012, while I was losing some time playing a Civ game as Babylonian Prince, reaching uncontested dominance around 1750... I had been playing Civ once in a while since the early 90's, and used to abuse the settler hack for doing massive land improvements. Once again, I wanted to transform all land, but once again I grew tired of spending long turns clicking around tens of settlers, with no real challenge other than patience, just for the sake of improving all land. So I just started looking for ways to hack my savegame and do it more quickly... and that's how it all started.

From that point on, I spent a lot of time researching: Dack's TerraForm seemed to do the job, but its free version limitations and registration policy just seemed like a "hassle" at the time - I just wanted to modify my MAP after all, right here, right now, and not after waiting 5 days and posting 5 "meaningful" comments to the civfanatics forums (incidentaly, I got a TerraForm license much later, when the 5 days and 5 posts were already achieved for a long time).

So I tried to look at the binary MAP by myself, and discovered that a single change in the land (forest turned to plains) resulted in an almost completely different MAP binary - which sounded to me as "it's gonna be hard".

After some research through the forums and the Internet, I found a seemingly inocuous comment by Gowron mentioning a fan-made (namely Joel McIntyre - I could never find a way to contact him) file viewer for MicroProse's Darklands "PIC" files, that was actually able to open Civ's MAP files as pictures. For some reason, this file viewer could not open Civ's PIC files though, only MAP (I discovered later on that it did not support Civ's EGA/VGA mixed palette format). Also it was just a viewer, not an editor or converter. But quite fortunately, the viewer came along with some C source code for parsing the PIC/MAP image format, together with some very valuable documentation about the PIC format.

After being unsuccessful in porting the C code to Java (which I was much more comfortable with), I pushed the research further trying to clarify what Joel McIntyre just hinted at in his document: the compression format used in PIC files. A lot of research, pen-and-paper, and programming trial and error lead me to strongly suspecting that it was using a variant of the [wiki]LZW[/wiki] algorithm (as suggested by McIntyre himself), together with the easy [wiki]RLE[/wiki].
I further researched LZW and polished my experimental implementations until it reached something rather exciting - this is when I started this thread: MAP, PIC and PAL formats figured out ! (almost).

At this point I could tweak my MAPs, but my expectations were surpassed by the ability to completely extract all CIV art from PIC files... My brain started buzzing into many directions: why only tweak MAPs and not SVEs, too? What about modding CIV art? Or actually modding Civ completely? Or actually make a tool to mod Civ completely?

I pursued those endeavours by creating small tools to extract PICs (and to some extent create PICs) as well as to make basic editions to MAPs and SVEs. Those were later merged as what is now JCivED.

While progressing on JCivED, I felt urges to push it to the end: I wanted a complete understanding of the SVE and MAP contents. Also, I wanted completely understanding of all Civ resources files, including fonts, music and sounds. The fonts (in FONTS.CV) were relatively easy to hack out, although it was quite an interesting experiment in pure pattern analysis and reverse-engineering - it obviously helped that FONTS.CV is not compressed...

But for the SVE and MAP contents, it wasn't that easy: while working through major portions of the SVE format (followed up in this thread, together with massive contributions from Gowron, Dack, Renergy, Whelkman, and others: SVE file format) even with a lot of tweak-and-play sessions, some parts remained very obscure. In particular, the land value algorithm was seemingly impossible to figure out, in spite of considerable effort by Dack, simonnomis and others (I myself wrote pieces code that would make batch screenshot recognition and statistical analysis on terrain / land value... but in vain).

I got intrigued by the extent of Gowron's contribution, and how he could figure out things by working directly with Civ's disassembly in IDA. I had given IDA a try, but always considered assembly as something beyond my reach, so I didn't even consider mastering it in any way. However, I must - as always - credit Gowron for giving my the crucial kickstart that changed everything: the "BreakPeaceDialog" from part 1 of the tutorial actually comes from his first message to me for getting familiar with IDA!

Soon enough, I discovered the existence of overlays: Civ disassembly in IDA is full of "INT 3Fh", with which IDA has a lot of trouble to deal. While researching the Internet for the logic of "INT 3Fh" and overlays, I realized the overlays were right inside CIV.EXE! As well, research proved fruitful in understanding MS-DOS interrupts, and how Civ (as most DOS programs) would configure its own interrupt handlers upon start-up - not only for 3Fh, by the way... Basically, the overlay loading code is right there within CIV.EXE, so all that remained was to understand it.
Days of research later (...) I realized it was possible to add more code to an IDA project, and could import manually all overlays into IDA. At the same time I was playing with IDA scripting, and managed to write a script that would automatically replace all the troublesome "INT 3Fh" with equivalent "far calls" into imported overlays.
Not long after, I found out the "Load/Save SVE" sub-routine that Gowron had mentioned, inside an overaly... This proved a dramatic progress for mapping the CIV data in the disassembly, and general understanding of CIV disassembly altogether... This also made it possible to finally decipher the SVE format (well not really, since some bits and bytes within some data structures remain unclear...)

That's it for my knowledge on overlays :) Of course there's more to tell down the road, how DOSBox Debug mode was crucial for understanding land value calculation for example... As of right now, my further endeavours brought me to starting writing an embryo of IDA-inspired Java-based DOS disassembler (based on distorm3) and decompiler, just to save myself from the "hassle" of manually going through all CIV code - especially the bigger sub-routines... And this leads me to researching things such as "static code analysis", including things in graph algebra, re-compiling DOS-era C decompiler source code, etc. No idea where this is all gonna end...

Re-reading the above, it seems I have my own concept of "hassle"... Also, I wanted to underline how research is essential and usually fruitful with today's Internet: things are not necessarily easy to find, but they can de found - thanks to the many anonymous (or not) people who put them out there.

And a special mention (again) to Randall Hyde's "The Art Of Assembly Programming", the 16-bit DOS version: http://www.phatcode.net/res/223/files/html/toc.html

And also, if you've not seen it yet, a handy online disassembler for easily writing your own DOS bytes (chose 8086 architecture): ODA - The Online Disassembler

One thing is really upsetting - I heard that even if one wanted to buy IDA (and spend 2000$ or so) they only sell it to trusted parties. In such cases I really strongly support piracy - it is not eating to their profits, since they are not selling it anyway, and I don't accept this infringement on my freedom to have tools. ;)

Well that's a point of view :) In that case, it shouldn't be a problem for you to dig out non-free versions of IDA lurking inside the confines of the Internet...
 
Now, in addition to its own code, CIV also loads drivers: MISC.EXE (180 bytes for keyboard and mouse), MGRAPHIC.EXE (7kb, MCGA graphics) and ASOUND.CVL (38kb, AdLib sound). EGRAPHIC.EXE is bigger than MGRAPHIC.EXE for some reason. But all in all, we're still within the bounds: 320kb + 7kb + 38kb = 365kb.
Now, CIV is also allocating graphics memory, at least 3 full 320x200 buffers, with 1 byte per pixel, so that makes 3 x 64000 = 187,5kb (not counting headers).
So we're up to 364kb + 187,5kb = 551,5kb... Still within the bounds.

... well I just (re-)discovered that CIV also allocates 128kb for loading VGA sprites (only 64kb if EGA). So 551.5kb + 128kb = 679.5kb...

 

Attachments

  • dosbox_003.png
    dosbox_003.png
    1.5 KB · Views: 627
Ok so did you NOP them in memory while CIV was running?
That's right.
Because even if you NOP the 5 bytes in the cold file CIV.EXE before running, for example you overwrite 9A 12 04 E8 1E with 90 90 90 90 90, the position of the last two bytes is still recorded in the relocation table... And when loading CIV.EXE, it would typically become 90 90 90 32 92 (9090 + 1A2 = 9232), which stands for "db 32; xchg ax, dx;"...
Really? It works that way? It records every far jump in this table? I wonder if there isn't any simplier way?
...[after some DOSBox "hacking" ;) ]
I just checked what you said and you are correct: I got 3 NOPs and some other instruction(s). :D
Can't tell you why either, without more details :) At this level, every bit counts, and some changes may have little to no impact, while others will just break everything... Detailed analysis is usually inevitable.
I investigated it in more details and it seems that my memory failed me here ;).
I probably changed this code "in place" (and not in exe) but it probably got overwritten before it was executed. This I can understand... :)
Ok I see, I didn't use CheatEngine yet for CIV, so I didn't know its limitations.
This is a limitation of DOSBOX - it shows proper code when you go to address "derived" from linear address, but setting breakpoint there doesn't work.
In fact, the short story is "me + Internet research (+ thank you Gowron)"...
The proper question I think should be "how you are able to find time for all that?" ;)
Your explanations later on are breath-taking - it is a huge commitment.
After being unsuccessful in porting the C code to Java (which I was much more comfortable with)
Really, I thought C/C++ is almost like Java. ;)
That's it for my knowledge on overlays :) Of course there's more to tell down the road, how DOSBox Debug mode was crucial for understanding land value calculation for example...
Is it really essential to know this stuff (I mean land value calculation)? My experience is that AI build cities on Plains, Grasslands and Rivers and NOWHERE ELSE and with regard to distance from other cities it tends to build at a distance of four tiles.
What I would really like to see "in this area" is to force AI to recalculate land value every once in a while. Or to disable this stupid behaviour of building cities closer than 4 tiles to each other. Or to make him build settlers even if it has not got 2 food surplus (otherwise civilzations like Chinese or English are almost always limited to one city).
And a special mention (again) to Randall Hyde's "The Art Of Assembly Programming", the 16-bit DOS version: http://www.phatcode.net/res/223/files/html/toc.html
My problem is I have many things to read in the job area, I have barely time/strength to read anything else...
Well that's a point of view :) In that case, it shouldn't be a problem for you to dig out non-free versions of IDA lurking inside the confines of the Internet...
Well, I heard of the leaked version, but without it we would be empty handed. And I don't like to depend on luck... ;)
 
That's right.

Really? It works that way? It records every far jump in this table? I wonder if there isn't any simplier way?

Yes it works that way: every far jump, far call, data reference, and every other place where a segment base is hard-coded must be recorded in the relocation table.

As a matter of fact, this is the way for IDA (and every other 16-bit disassembly tool, I guess) to auto-detect the segments: by reading the value at every place recorded in the relocation table, all segments in the program (almost) can be deducted.

This is a limitation of DOSBOX - it shows proper code when you go to address "derived" from linear address, but setting breakpoint there doesn't work.

DOSBox breakpoints, got it!

The proper question I think should be "how you are able to find time for all that?" ;)
Your explanations later on are breath-taking - it is a huge commitment.

Just like everyone else, I guess: short nights, early mornings, shaved odd/idle hours, marital risk management... :)
But the real point here is the reward in all that: the amount of knowledge and experience I am amassing is just beyond anything I could imagine. It actually makes me think about career choices...
And also, I am realizing, there's an excitment, a feeling of achievement in doing something that "no one has done before" (regardless that there was no point for anyone to do it before).

Really, I thought C/C++ is almost like Java. ;)

Well, in terms of syntax, Java looks a lot (a lot) like C/C++, but Java is at higher level of abstraction.
In terms of C-to-Java conversion, the devil lies is the details: the source code in question was heavily relying on global variables, pointers/references as return values, and recursive calls passing them through. In addition, it was reverse-engineered from assembly in the first place (don't know by whom), undocumented, and I was trying to port it without understanding what it was doing... In a word: a sure failure. Especially when you know that LZW is all about bit-level manipulations.

Lately I've spent some more time on other C-to-Java porting (a fast disassmebly library called distorm3, and an old MS-DOS decompiler called dcc), and I must say it is just very tedious. The structs and unions are a bore. So are accessing the middle of arrays using their direct address. And the #define directives. And the custom primitive types.

Is it really essential to know this stuff (I mean land value calculation)? My experience is that AI build cities on Plains, Grasslands and Rivers and NOWHERE ELSE and with regard to distance from other cities it tends to build at a distance of four tiles.
What I would really like to see "in this area" is to force AI to recalculate land value every once in a while. Or to disable this stupid behaviour of building cities closer than 4 tiles to each other. Or to make him build settlers even if it has not got 2 food surplus (otherwise civilzations like Chinese or English are almost always limited to one city).

But there's more to it: is Plains A more suitable than Plains B? When you look for a suitable city square, I'm sure you analyze surroundings - and CIV AI does it to, relying on land values.
The context here was not (yet) to hack or fix CIV, but to develop a map editor - JCivED - that players could used seamlessly with their original CIV.EXE: if you allow players to customize their maps, the land values must be calculated for the AI to use the map properly.
Of course, it is not necessary to match the exact algorithm that CIV uses. Dack had the same issue with TerraForm), and he did use an approximation for TerraForm, which did the job, as far as I know.
But when writing JCivED, I strived for exactitude - I also spent an inconsiderate amount of time on the map rendering, up to perfection (especially now that I have the rendering routines disassembled).

My problem is I have many things to read in the job area, I have barely time/strength to read anything else...

Well, I can't help you with that :) Maybe the only way is for such readings to become part of you job, I guess... Mobile browsing/reading on commute may help too.

Well, I heard of the leaked version, but without it we would be empty handed. And I don't like to depend on luck... ;)

Then push your luck! ;)
 
But the real point here is the reward in all that: the amount of knowledge and experience I am amassing is just beyond anything I could imagine. It actually makes me think about career choices...
Ooo yes, I can understand that. I sometimes also find my work being much less exciting than such "hacking" activities.
In terms of C-to-Java conversion, the devil lies is the details: the source code in question was heavily relying on global variables, pointers/references as return values, and recursive calls passing them through. In addition, it was reverse-engineered from assembly in the first place (don't know by whom), undocumented, and I was trying to port it without understanding what it was doing... In a word: a sure failure. Especially when you know that LZW is all about bit-level manipulations.

When you put it like this, it sounds like quite a job.

The context here was not (yet) to hack or fix CIV, but to develop a map editor - JCivED - that players could used seamlessly with their original CIV.EXE: if you allow players to customize their maps, the land values must be calculated for the AI to use the map properly.

Oh yes, I see your point. Although as I recall, AI recalculates terrain after game starts. It is not like you can put different "valuations" on a map and AI simply accepts it. So it is for map editor but only in saved games (or only to build proper [binary-wise] structure of a map).

Well, I can't help you with that :) Maybe the only way is for such readings to become part of you job, I guess... Mobile browsing/reading on commute may help too.

Well, I'm somewhat exagerating - although I always feel kinda bad when I loose big amounts of time for some "lost cause".

BTW. Are you able to "assemble" your code from IDA to an executable? I mean - are you able to insert code, which is crucial for correcting bugs and without "reassembling" you cannot add new code. For example, as you pointed out "Arctic bug" is caused by routine not checking whether it is writing inside its memory area. If I remember correctly, you found a work around for this bug, but I mean - are you able also to correct bugs (by adding or changing code at your discretion and not being limited by requirement to fit in bytes provided by original instructions). Or will you be able to. :)

BTW2. How do you document disassemled code? Do you write pseudo-C code for it and if so, where do you write it.
 
BTW. Are you able to "assemble" your code from IDA to an executable? I mean - are you able to insert code, which is crucial for correcting bugs and without "reassembling" you cannot add new code. For example, as you pointed out "Arctic bug" is caused by routine not checking whether it is writing inside its memory area. If I remember correctly, you found a work around for this bug, but I mean - are you able also to correct bugs (by adding or changing code at your discretion and not being limited by requirement to fit in bytes provided by original instructions). Or will you be able to. :)

Well you can always change code with hex editor. That's how I have done it with my experiences with civ.exe. I haven't found assembler inside free IDA version, don't know if it has one. It would be very helpful of course.

What interest me too is the adding new code. If I ever get into it, for example I might want to try to change the bonus resource placement formula to be more random. And if I ever find the tech cost formula I would certainly want to change that. Plus many more things.

I don't know much about assembly and DOS but would it be possible to add new subroutines to the end of all code? So that you could add calls to the subroutine from certain points from the code?

---

This seems like excellent tutorial, thak you darkpanda.
 
Well you can always change code with hex editor. That's how I have done it with my experiences with civ.exe. I haven't found assembler inside free IDA version, don't know if it has one. It would be very helpful of course.
Well, AFAIK IDA allows to generate .asm file. But my question is whether it is sensible .asm file, which can be assembled or wheter darkpanda corrections to that code make it possible to assemble new .exe.
What interest me too is the adding new code. If I ever get into it, for example I might want to try to change the bonus resource placement formula to be more random.
Well, place of resources is not constant. If you change the "seed" they "move", so only thing you could change is the algorithm which is used to calculate those places.
And if I ever find the tech cost formula I would certainly want to change that.
Well, I don't think you could change too much. It is only 16-bit number, probably signed. I ran into this problem in Civ2; I wanted to increase trade ten times but at the same time to slow discoveries 10 times, so I changed tech paradigm I think 100 times and I think I ran into this barrier - you cannot have more than 32767 requirement for 1 technology.
I think I even looked into this issue in Civ1. Tech paradigm for AI player is 14 - difficulty and for Human player it is 2*difficulty + 6. Although there can be some other modifications, which I overlooked. I think on Prince difficulty it is clearly 10, 20, 30,... but in others it is not that linear.
My interest (and regret that it is very difficult to change) would be to change constants in games. For example in Civ1 there is a limit of 128 armies per player. For 640 kB memory limit it seems like reasonable limitation. But in other games - for example Warlords 3 - it is ridiculous. The whole game is limited to 1000 armies altogether. But this is 32-bit Windows game - we have 4GB of address space and they limit us to lousy 1000 armies? Even in Warlords 1 there weren't any such limitations.
I don't know much about assembly and DOS but would it be possible to add new subroutines to the end of all code?
I don't think it is easy to write such subroutines in the first place. For example with newer games Cheat Engine provides us with mechanism to "patch" code of a game by making "far" jump to memory allocated by Cheat Engine and doing whatever you like. But it would require great understanding of game internals to do it in ways that are not trivial. For example I "patched" in this way Civilization 2 - to make changing terrain also change value of a terrain to AI. But it was very simple modification. To make your own Tech calculations it would require much more. Not to mention it would require extensive testing.
 
Oh yes, I see your point. Although as I recall, AI recalculates terrain after game starts. It is not like you can put different "valuations" on a map and AI simply accepts it. So it is for map editor but only in saved games (or only to build proper [binary-wise] structure of a map).

From what I have seen in version 474.01, those values are never recomputed after game start. They are actually calculated during map generation ("In the beginning, ..."), and then never change, even if you modify the terrain, e.g. turn forests to plains.

So, indeed, the AI does simply accept your values from the editor. Which gives a lot of leverage in terms of scenario design, if you want to force AI to build cities on specific squares only.



BTW. Are you able to "assemble" your code from IDA to an executable? I mean - are you able to insert code, which is crucial for correcting bugs and without "reassembling" you cannot add new code. For example, as you pointed out "Arctic bug" is caused by routine not checking whether it is writing inside its memory area. If I remember correctly, you found a work around for this bug, but I mean - are you able also to correct bugs (by adding or changing code at your discretion and not being limited by requirement to fit in bytes provided by original instructions). Or will you be able to. :)

To answer shortly:
  • I can't assemble from IDA (yet): all assembly code I ever wrote for patches was manually designed, byte by byte, by taking inspiration from IDA's disassembly/binary
  • I did try to insert my own code, but only by overwriting an existing sub-routine, "random()"; the code overwrite did work properly, except that it didn't behave as I expected, and I never debugged its logic, nor released it to the public
  • Now that I successfully merged overlays back in the main CIV.EXE, I know for a fact that it is technically possible to add arbitrary code. The biggest challenge remains to write the code, and also record potential relocations; still, I think it is nearly impossible to do the whole thing manually, some custom scripts or programs should be written to help with the complex updates

BTW2. How do you document disassembled code? Do you write pseudo-C code for it and if so, where do you write it.

In IDA itself, you can comment any disassembled line by pressing ";" or ":". This make it possible to explain in details what's happening, as shown below for example:

comments.png

For the reverse-engineered code for PIC decompression, it was proper C code, but Joel McIntyre mentioned that it was written based on disassembly (unavailable). The C code itself did not contain any comment, and the variable names were not very telling. In a word: not well documented :)

Myself, for some parts of the disassembled code, I have Eclipse opened right next to IDA and I manually translate from assembly to Java... Then debug the thing using DOSBox debug, comparing register values with what is expected, etc. This is how I did for land value computation, random map generation, unit and city rendering, random routine, etc.

I also started to do it for the city screen, but it is very, very big, and that's why I started to investigate possibilities of having part of this work done automatically (i.e. decompiler).
 
What interest me too is the adding new code. If I ever get into it, for example I might want to try to change the bonus resource placement formula to be more random.

Well, place of resources is not constant. If you change the "seed" they "move", so only thing you could change is the algorithm which is used to calculate those places.

I think that's what hannurabi wants to do: change the algorithm itself... But it's true that the hardest part is to first design the algorithm to achieve the randomness you wish. It is really not straightforward. After that, you may either overwrite the original routine (your custom routine thus needs to be shorter or of equal byte size), or insert it inside the original code, while pushing other bytes around to make room, redirecting calls to your routine, and adding proper relocations to the table.


I don't know much about assembly and DOS but would it be possible to add new subroutines to the end of all code? So that you could add calls to the subroutine from certain points from the code?

Yes it is possible - but quite complex... See my response to M-Z's post, previously. I'm actually thinking of making a tutorial and/or a tool to explain and demonstrate this.

This seems like excellent tutorial, thak you darkpanda.

Thanks :)

For example in Civ1 there is a limit of 128 armies per player. For 640 kB memory limit it seems like reasonable limitation.

In Civ1, one unit takes 12 bytes (0x0C). The memory is statically allocated to units, for all possible units in-game: that makes a total of 128 Units x 8 Civs (incl. Barbarians) x 12 bytes = 12,288 bytes (12kb):
  • In terms of memory, changing this to twice as many units, for example (i.e. 256 units per Civ), there should be enough conventional memory (< 640kb) to make it work
  • In terms of code, it seems impossible: the value 128 is hard-coded in every single unit loop... Most likely, the C source code must have had a '#define MAX_UNITS 128' somewhere in civ.h (file name is a pure speculation), and then everywhere in the code something like' for(i=0;i<MAX_UNITS;i++) { ...'; in addition to that, all data located after the units would have to be pushed away to make room for the additional units, and all references to this data should be moved as well... or alternatively the unit data itself should be relocated to the end of memory, memory allocations changed to accomodate for the new size, and all references to the unit data area changed to the new location...

Thinking about it, it may actually be possible... (scratching my head...) It actually makes me want to try ;)
 
I don't mean to hijack this thread but this got my interest:

I think that's what hannurabi wants to do: change the algorithm itself... But it's true that the hardest part is to first design the algorithm to achieve the randomness you wish. It is really not straightforward. After that, you may either overwrite the original routine (your custom routine thus needs to be shorter or of equal byte size), or insert it inside the original code, while pushing other bytes around to make room, redirecting calls to your routine, and adding proper relocations to the table.

At first it seemed quite hard to get random enough results for algorithm, since the bonus tile is determined by formula and not by variable. (Am I correct?) But I just tried different things with the help of my civclone and came up with something like this:

Spoiler :
Code:
unsigned int seed =  'constant random seed';

	for (int x = 0; x < Width; x++)
	{
		for (int y = 0; y < Height; y++)
		{
			unsigned int randstate = seed;
			int i = 0;
			while (i < x * y)
			{
				randstate = randstate * 1103515245 + 12345;

				if (randstate % 3 == 0)
					i = i - 1;
				else
					i = i + 1;
			}

			if (randstate % 8 == 0)
				bonus(x,y) = true;
		}
	}

It's gives results that seem to be quite random. Also, often bonus resources are plenty and sometimes they are just few. See attached pics. I'm not mathematician so I have no clue why the algorithm gives those results. I think this also needs some further testing and adjustment too.

Next challenge is to overwrite the Civdos formula with assembly version of this formula, don't know how hard or easy or possible that might actually be.

It's worth saying that it's possible that I have made some logic failure and it's not possible to add this into civdos:p
 

Attachments

  • rnd_example1.png
    rnd_example1.png
    107 KB · Views: 269
  • rnd_example2.png
    rnd_example2.png
    69 KB · Views: 207
From what I have seen in version 474.01, those values are never recomputed after game start. They are actually calculated during map generation ("In the beginning, ..."), and then never change, even if you modify the terrain, e.g. turn forests to plains.
I meant that you can place custom valuation on Earth map (at least in your editor, AFAIR), but the AI recalculates it anyway after you choose to play on Earth.
So, indeed, the AI does simply accept your values from the editor. Which gives a lot of leverage in terms of scenario design, if you want to force AI to build cities on specific squares only.
That is interesting observation in the sense that in Civ2 where we actually had scenarios, there weren't any possibility of changing terrain valuations. And playing WW2 scenario with Soviets founding new cities in their vast territory was kind of ridiculous.
To answer shortly:
  • Now that I successfully merged overlays back in the main CIV.EXE, I know for a fact that it is technically possible to add arbitrary code. The biggest challenge remains to write the code, and also record potential relocations; still, I think it is nearly impossible to do the whole thing manually, some custom scripts or programs should be written to help with the complex updates
When one thinks through many aspects of such task, the complexity becomes even greater. For example another problem arises, namely save files would stop working. In Civ it doesn't matter that much - new saves would work. But with my Warlord example all scenarios would stop working.
Now that you mentioned these relocations, one see how many of these things are obscured by high level programming languages. :)
Myself, for some parts of the disassembled code, I have Eclipse opened right next to IDA and I manually translate from assembly to Java... Then debug the thing using DOSBox debug, comparing register values with what is expected, etc. This is how I did for land value computation, random map generation, unit and city rendering, random routine, etc.
Organizing such task in a way that will be readable in 3 months to the future seems like a problematic thing. In my small investigations I ended up doing some things twice because I failed to properly document them. :)
I also started to do it for the city screen, but it is very, very big, and that's why I started to investigate possibilities of having part of this work done automatically (i.e. decompiler).

I must say I doubt in readability of such automatic output.
 
I don't mean to hijack this thread but this got my interest:



At first it seemed quite hard to get random enough results for algorithm, since the bonus tile is determined by formula and not by variable. (Am I correct?) But I just tried different things with the help of my civclone and came up with something like this:

Spoiler :
Code:
unsigned int seed =  'constant random seed';

	for (int x = 0; x < Width; x++)
	{
		for (int y = 0; y < Height; y++)
		{
			unsigned int randstate = seed;
			int i = 0;
			while (i < x * y)
			{
				randstate = randstate * 1103515245 + 12345;

				if (randstate % 3 == 0)
					i = i - 1;
				else
					i = i + 1;
			}

			if (randstate % 8 == 0)
				bonus(x,y) = true;
		}
	}

It's gives results that seem to be quite random. Also, often bonus resources are plenty and sometimes they are just few. See attached pics. I'm not mathematician so I have no clue why the algorithm gives those results. I think this also needs some further testing and adjustment too.

Next challenge is to overwrite the Civdos formula with assembly version of this formula, don't know how hard or easy or possible that might actually be.

It's worth saying that it's possible that I have made some logic failure and it's not possible to add this into civdos:p

Your formula looks interesting, but there's a catch: in CivDOS, the special resource pattern is implemented as a function, with the following signature:
boolean isSpecialResource(int x, int y);

This function contains the famous formula, which is recomputed, only for x and y, every time the function is called.

In your example, you described a process of spreading around the special resources... But after this process is done, how can you write it as a function with the same signature as above?

The only alternatives I see are:
  • Run the full algorithm on every call, and stop when the input x,y is reached -> too long
  • Store the special resource pattern after initial processing, then lookup stored values on every call -> very different from existing implementation, and so quite heavy in terms of hacking

I have thought, before, of the latter approach, and coming back to it now, it seems more and more achievable (there's room in the MAP to store special resources and huts).

As a matter of fact, I suggest we redirect this discussion to this other thread I started some time ago, for this exact purpose: Civ1 Advanced hacking: special resource and hut patterns
 
I meant that you can place custom valuation on Earth map (at least in your editor, AFAIR), but the AI recalculates it anyway after you choose to play on Earth.

Yes, starting with Earth will regenerate land values. Actually, it is not possible to define land values within the Earth's MAP.PIC file - it only contains geography.

But it is possible to modify land values inside a gamesave CIVIL#.MAP file, and in that case CIV's AI will accept it "as is".

Now that you mentioned these relocations, one see how many of these things are obscured by high level programming languages. :)

Note that the relocation table is not about the language or the program itself, it's about the OS "packaging" of the executable, in our case for "MZ EXE" for 16-bit MS-DOS.
If you compile the same C source code for another system (e.g. Win32), then you will have a different packaging logic, such as "PE", without the relocation table (but surely other intricate stuff as well, such as DLL dependencies, etc.)

Organizing such task in a way that will be readable in 3 months to the future seems like a problematic thing. In my small investigations I ended up doing some things twice because I failed to properly document them. :)

Sounds familiar :D

I must say I doubt in readability of such automatic output.

The main issue I have with manually parsing the assembly is to re-construct the control structures (for loops, while loops, if-then-else, switch...). I don't necessarily need to decompile the contents of each block, but the general control flow.
For very large sub-routines, IDA is not very practical, and my belief is that it would be much easier to reverse-engineer with a kind of "indented" assembly, following the control flow.
Aside from that, I think I have what I need to re-use the variables and functions namings from IDA into my own decompiling code :)
 
Actually, it is not possible to define land values within the Earth's MAP.PIC file - it only contains geography.
So in JCived you only didn't want to change UI and this valuations defined there goes nowhere?
Note that the relocation table is not about the language or the program itself, it's about the OS "packaging" of the executable, in our case for "MZ EXE" for 16-bit MS-DOS.
If you compile the same C source code for another system (e.g. Win32), then you will have a different packaging logic, such as "PE", without the relocation table (but surely other intricate stuff as well, such as DLL dependencies, etc.)
I meant what you said about changing all references which comes after the expanded area (in case we were to increase max number of units). In C (probably) there wouldn't be any other change than something like this:
Code:
struct Unit { (...) } UnitTable[PLnum][128]
to
Code:
struct Unit { (...) } UnitTable[PLnum][256]

But as you mentioned, in Assembler there would be plenty of work - all code which uses data which "lies after" expanded area would have to be changed.
In 32-bit Windows application (not to mention 64-bit) I think it would be easier to relocate the whole structure which one wants to expand and then only change references to this structure, leaving everything else intact.
For very large sub-routines, IDA is not very practical, and my belief is that it would be much easier to reverse-engineer with a kind of "indented" assembly, following the control flow.
That reminds me - is there a way to force IDA to draw more on a graph view than it is doing by default? It wouldn't necessarly help with large functions, nonetheless it could be useful in some cases.
 
So in JCived you only didn't want to change UI and this valuations defined there goes nowhere?

In short: yes, this valuation goes nowhere for exported Earth maps. But if you change them for an actual gamesave, they will be kept.

I don't know if you're aware of this trick, that Dack discovered: if you edit a gamesave, set its date to 4000 BC and turn count to 0, then loading this gamesave will actually take you to the "Difficulty" and "Civ" selection screen, just as if you started the game... So this gives you the possibility to create a full fledged scenario as a "gamesave" (including land values, and other things), and still let the user start it as a "new" game when loading it.

M-Z said:
I meant what you said about changing all references which comes after the expanded area (in case we were to increase max number of units). In C (probably) there wouldn't be any other change than something like this:
Code:
struct Unit { (...) } UnitTable[PLnum][128]
to
Code:
struct Unit { (...) } UnitTable[PLnum][256]

But as you mentioned, in Assembler there would be plenty of work - all code which uses data which "lies after" expanded area would have to be changed.
In 32-bit Windows application (not to mention 64-bit) I think it would be easier to relocate the whole structure which one wants to expand and then only change references to this structure, leaving everything else intact.

I don't think it is any easier in 16, 32 or 64 bit assembly... The real issue here is that this memory area is pre-allocated, because of the C array declaration "Unit[xxx]". In Win32, I believe an array declaration like that would also result in statically allocated space in the assembly - but I may be wrong, my Win32 experience is close to zero...

If it was dynamically allocated, like "malloc(MAX_UNITS*sizeof(Unit))", then I guess it would be much simpler, especially in Win32 (or 64) where you don't care too much about memory limitations: you would just need to change the MAX_UNITS value (and of course, all the loops that use MAX_UNITS as their boundary condition check).

M-Z said:
That reminds me - is there a way to force IDA to draw more on a graph view than it is doing by default? It wouldn't necessarly help with large functions, nonetheless it could be useful in some cases.

What more would you like to see?
Within sub-routines, the graph view already shows everything, and you can even assign specific colors and names to individual code blocks.
But personally, I would really like to be able to move blocks around and for IDA to properly readjust edges, because sometimes the automatic layout is really counter-intuitive....
 
I don't know if you're aware of this trick, that Dack discovered: if you edit a gamesave, set its date to 4000 BC and turn count to 0 (...) So this gives you the possibility to create a full fledged scenario as a "gamesave" (including land values, and other things), and still let the user start it as a "new" game when loading it.
I was aware of this trick but I haven't realized possibilities given by combination of these two things.
I only used somewhat different trick to force Civilization to enable saving game on first turn - changing turn counter to non-zero value. ;)
I don't think it is any easier in 16, 32 or 64 bit assembly... The real issue here is that this memory area is pre-allocated, because of the C array declaration "Unit[xxx]".
I only meant that with limited address space in DOS it may be difficult to simply relocate a "table", and like maybe segmentation adds some troubles too (but this may be my ignorance about segmentation). Maybe it is too obvious observation, but since we need more memory nonetheless, it is better to relocate data to be expanded than to expand it in place moving everything else.
The same maybe true in "expanding" the code. For example "jumping out" from expanded function to new memory area and then returning, rather than relocating code of other functions so to free some space "in place".
If it was dynamically allocated, like "malloc(MAX_UNITS*sizeof(Unit))", then I guess it would be much simpler, especially in Win32 (or 64) where you don't care too much about memory limitations: you would just need to change the MAX_UNITS value (and of course, all the loops that use MAX_UNITS as their boundary condition check).
Yes, I think malloc/new is easier, but it doesn't suffer (I think) from the problem of changing references in code (only as you mentioned loop counters and such). Since it is dynamic, there must be somewhere in static data a pointer to new memory and its size is not that relevant.
What more would you like to see?
Within sub-routines, the graph view already shows everything, and you can even assign specific colors and names to individual code blocks.
Well, I meant to show maybe not FULL graph of a program, but somewhat more than IDA displays by default. I think there is a way to build full graph, but only in this external Microsoft "graph drawing" application. I contemplated possibility of having bigger graphs in graph views.
But personally, I would really like to be able to move blocks around and for IDA to properly readjust edges, because sometimes the automatic layout is really counter-intuitive....
It seems like lot of new metadata to save to IDA project... :)
 
Well, I meant to show maybe not FULL graph of a program, but somewhat more than IDA displays by default. I think there is a way to build full graph, but only in this external Microsoft "graph drawing" application. I contemplated possibility of having bigger graphs in graph views.

Actually, the main reason why I started to look at doing my own disassembly/decompiling tool, is because IDA fails to display the graph view for the 2nd biggest sub-routine in CIV, which I called "display main screen", although I am not certain what it does in details. This is very strange, because the "city process" sub-routine is bigger, and yet IDA can show its graph view without any problem (after having set the number of possible graph nodes to the maximum value).

So I wanted to have a graph view for this sub-routine by myself... Still, it's far from easy :)
 
Top Bottom