Sorry for poking this again, but I think there's some highly relevant information here, and more details to be discussed !
I might have a bit to add regarding overlays, as well as some questions. I'm also involved analyzing a program compiled with overlays. Thanks to darkpanda for some very good hints! To go a bit further :
- overlayed programs compiled with old Microsoft C compiler versions (4, 5, 6, 7, probably at least up to visual C 1.x ?) can use an overlay manager such as darkpanda described. I eventually found out that the source code for this manager was included in the VCC 1 release ! Google for "vccrt1.zip" which contains the file "OVLM6L.ASM" in MISC\DOS\.
- the INT 3F calls are mostly as darkpanda described, but with a slight difference. The first byte after CD 3F is called the "segment number" and is not necessarily mapped 1-1 to the "overlay number" . There's actually a lookup table ("Segment number to overlay number" in the source code) to get the overlay number.
Here's an example of what I mean, from my own project (not civ) :
Code:
int3F [segnumber] [offset] => ovl_number (from LUT)
CD 3F 3A 02 00 =>16
CD 3F 3B 2C 01 =>16
CD 3F 3B 06 00 =>16
CD 3F 3C 00 00 =>17
And of course the "overlay number" refers to the individual "chunks" (each with their own MZ header and relocations) packed at the end of the main .exe.
I assume in CIV the mapping must have happened to be 1:1 but that would depend on how the .exe was linked originally. Presumably if multiple units were linked/combined per overlay (say, 2-3 source files) this might explain what I'm seeing -- perhaps each "segnumber" corresponds to one source file. Just speculation.
- the ovl manager allows calls *between* overlays, which is nifty but makes analysis more complicated. It handles this by maintining its own call stack so that on calls / returns between overlays it can swap in the correct chunks. So, the best is to "flatten" the .exe like darkpanda did -- although it will no longer be a valid DOS .exe, it will serve perfectly for static analysis. There are some issues though if the whole thing goes beyond FFFF:FFFF (1MB address space) : IDA is quite happy to map stuff at 10000:0000 and higher, but the call fixups will be impossible. i.e. :
Code:
;original opcode:
CD 3F [segno] [offset_lo] [offset_hi] ;this is "int 3F" followed by 3 bytes to identify destination
;change to a "call far" opcode
9A [addr_l] [addr_h] [seg_l] [seg_h] ;obviously can't point to a seg > 0xFFFF !
- one issue I'm having (same as RobinHood70 mentioned above) : the overlays are meant to be loaded at a specific location.
Code:
layout in memory
(when running)
+---------------------------+ IMG_BASE
| |
| main code (overlay 000)|
| |
| |
+---------------------------+ OVL_BASE
| overlays loaded here |
| (initially all 0x00) |
| |
| |
+---------------------------+ OVL_BASE + max_ovl_size
| |
| rest of main code |
| (ovl 000) |
| |
| also data segment etc. |
| |
| |
+---------------------------+ SS:0000
| |
| stack |
| |
+---------------------------+ initial SS:SP (top of stack)
The overlay manager, when swapping in a new ovl, goes through its relocation table just as would happen when a regular .exe is loaded. The problem is : if we force the overlays to be loaded after the main image, like what happens in IDA when doing "load additional binary", the overlay's relocations are not parsed. So if for example the overlay calls functions in the main code (ovl_000), those cross-references will be invalid in IDA. Same thing for global/shared data.
darkpanda : I'd be interested to hear more about how you dealt with this - perhaps it wasn't an issue in civ, or you were able to simply ignore the problematic refs ?
RobinHood70: depending on darkpanda's answer, I have a plan to deal with the relocations manually, because it's a problem for my project. It would be to load the overlays at the end of the image (before the stack; I'd sabotage the stack ptr afterwards since it's not needed for static analysis). Then for the relocs, they would need to be adjusted individually depending on where they pointed to (before, inside, or after the overlay image). This could probably be done in an IDC script.