1. We have added a Gift Upgrades feature that allows you to gift an account upgrade to another member, just in time for the holiday season. You can see the gift option when going to the Account Upgrades screen, or on any user profile screen.
    Dismiss Notice

Tutorial: CIV+IDA - Part 2

Discussion in 'Civ1 - General Discussions' started by darkpanda, Mar 20, 2014.

  1. darkpanda

    darkpanda Dark Prince

    Joined:
    Oct 28, 2007
    Messages:
    604

    Interestingly enough, IDA can successfully display the diagram for this sub-routine when disassembling CIV.EXE v474.04... Beats me as to why it can't for v01, but my life just got easier!
     
  2. MM1nd

    MM1nd Chieftain

    Joined:
    Oct 2, 2010
    Messages:
    8
    Sorry for Thread Necromancing.

    Could you give out some details on how you did this? Any Hint would be helpful, the full IDA script of course would be superb.
     
  3. darkpanda

    darkpanda Dark Prince

    Joined:
    Oct 28, 2007
    Messages:
    604
    I am not sure which details you need, but here's a breakdown anyway:

    All overlays are MZ-formatted executables, located after the main MZ executable contained in CIV.EXE.
    Just look for the "MZ" plain text string in CIV.EXE, and you should find 25 occurences:
    • 1 at the beggining of CIV.EXE
    • 1 in the middle of plenty of other bytes = the code that loads overlays does a comparison with "MZ" string
    • 23 other occurences all beginning at regular offsets, after long sets of padding 00's = those are the 23 overlays of CIV.EXE

    I wrote a small program to extract each overlay into a standalone overlay file (e.g. .OVR/.OVL) for further processing.

    This is done through File > Load file... > Additional binary file...


    After selecting a file, further details must be provided about how to load it:


    • Loading segment: this will be the overlay's segment base in the IDA project; IDA provides, by default, the first available multiple of 0x200 (1 page = 512 bytes) after the existing IDA project contents
    • Loading offset: this is the offset within the overlay segment where to import the overlay bytes (I think I always left it to 0)
    • file offset in bytes: this is the start point, in the selected file, where overlay bytes should be read from; warning: this must be the beginning of the overlay code, so the overlay header must be skipped
    • Loading segment: this is the total number of bytes to read from the file

    In fact I wrote 2 IDC scripts:
    • 1 script to automatically rename all the overlays segments to "ovrXX" format (instead of IDA's default of "segXXX", starting from "ovr01"; this script simply looks for the segment named seg033 (the first overlay, since main CIV.EXE has 33 segments form seg000 to seg032), and form then on, renamed all segments accordingly: ovr01, ovr02, ..., ovr023

      Spoiler :
      Code:
      #include <idc.idc>
      
      static main()
      {
      	auto seg, loc, ovrseg;
      	auto off, base;
      	auto xref;
      	auto sn, sn0;
      	auto b;
      	auto segc, c;
      
      	// init seg to seg000
      	seg = FirstSeg();
      	// skip to next segment until reaching seg033 (= ovr01)
      	while(seg != BADADDR && SegName(seg)!="seg033")	{ seg = NextSeg(seg); }
      
      	// 'c' is the overlay index
      	c = 1;
      	while(seg != BADADDR)
      	{
      		// retrieving the current segment name segXXX - just for logging
      		sn = SegName(seg);
      		// building the new segment name as "ovrXX"
      		sn0 = "ovr"+((c<10)?"0":"")+ltoa(c,10);
      		// logging
      		Message("Renaming '%s' to '%s'\n",sn,sn0);
      		// renaming the segment
      		SegRename(seg,sn0);
      		// setting the segment as CODE
      		SegClass(seg,"CODE");
      		// setting the segment addressing as 16-bit
      		SetSegAddressing(seg,0); // 16-bit
      		// force segment analysis
      		AnalyzeArea(SegStart(seg),SegEnd(seg));
      
      		// proceed to next segment
      		seg = NextSeg(seg);
      		c++;
      	}
      }
      

    • 1 script to replace all occurrences of INT 3Fh with far calls to the routine in the overlay segments

      Spoiler :
      Code:
      #include <idc.idc>
      
      static main()
      {
      	auto seg, loc, ovrseg;
      	auto off, base;
      	auto xref;
      	auto sn, sn0;
      	auto b;
      	auto segc, c;
      
      	c = 0;
      	seg = FirstSeg();
      	segc = 0;
      	while(seg != BADADDR )
      	{
      		loc = SegStart(seg);
      		Message("Segment %s ...\n", SegName(seg));
      
      		// scan the segment code, looking for "CD 3F" sequence
      		// which stands for "int 3Fh"
      		while(loc < SegEnd(seg)) {
      			if( Byte(loc) == 0xCD && Byte(loc+1) == 0x3F)
      			{
      				Message("   found int 3Fh at %s - 0x%x(%d) [seg start: %d]\n", atoa(loc), loc,loc, SegStart(seg));
      
      				MakeUnknown(loc,5,1);
      
      				// Overlay call is CD 3F xx yy zz, where:
      				//    [B]xx[/B] is the overlay number
      				//    [B]yy zz[/B] is the function offset inside the overlay, little endian mode
      				sn0 = "ovr"+(Byte(loc+2)<10?"0":"")+ ltoa(Byte(loc+2),10);
      				Message("     - referenced overlay: 0x%x(%d) -> %s ; overlay function offset: 0x%x(%d)\n", Byte(loc+2),Byte(loc+2),sn0, Byte(loc+3)+(Byte(loc+4)<<8),Byte(loc+3)+(Byte(loc+4)<<8));
      				
      				ovrseg = SegByBase(SegByName(sn0));
      
      				Message("     - overlay segment: %d; address: %d \n", ovrseg, GetSegmentAttr(ovrseg,SEGATTR_START));
      				Message("     - far call patch (hex): %x %x %x %x %x\n", 0x9A, Byte(loc+3), Byte(loc+4), (ovrseg>>4)&0xFF, (ovrseg>>12)&0xFF );
      
      				// Far call is 9A ww xx yy zz , where:
      				//    [B]ww xx[/B] is the function offset inside the overlay, little endian mode = directly copied from CD 3F entry
      				//    [B]yy zz[/B] is the function's segment base, little endian mode
      				PatchByte(loc,0x9A);
      				PatchByte(loc+1,Byte(loc+3));
      				PatchByte(loc+2,Byte(loc+4));
      				PatchByte(loc+3,(ovrseg>>4)&0xFF);
      				PatchByte(loc+4,(ovrseg>>12)&0xFF);
      
      				MakeCode(loc);
      			
      				loc = loc + 5;
      				segc++;
      				c++;
      			} else {
      				loc = loc + 1;
      			}	
      		}
      		Message(" -> patched %d INT 3Fh overlay calls in segment %s\n",segc,SegName(seg));
      		seg = NextSeg(seg);
      		segc = 0;
      	}
      	Message(" Total patched INT 3Fh overlay calls: %d\n",c);
      }
      


    I wish I had more time to describe it properly (or to continue the tutorial, for that matter), but I hope this will help you anyway.
     

    Attached Files:

  4. MM1nd

    MM1nd Chieftain

    Joined:
    Oct 2, 2010
    Messages:
    8
    This helps. A lot. Thank you.

    For some reason, IDA seems to have done this all by itself for the exe I am investigating (Sid Meier's RRT for the curious). Currently I am on a different machine, but I will investigate this later.

    Do you by chance know if this is dependent on how overlays are actually managed in the code? That is assuming that there might be differences between RRT and CIV. I have not looked into that yet.

    I'll find my way from here, I think. Thank you again.
     
  5. darkpanda

    darkpanda Dark Prince

    Joined:
    Oct 28, 2007
    Messages:
    604
    That means your version of IDA is more recent than mine, most likely.

    Not sure to understand your question but please note: my script is just a trick so that the full executable can be seamlessly processed in IDA, as if overlays were just other code segments.

    At runtime (when CIV.EXE is running), the management of overlays is completely different:
    • at startup, an interrupt handler is loaded by CIV.EXE to take care of calls to interrupt 0x3F (this code comes from the MSVC compiler, nothing to do with CIV)
    • a single segment is dedicated to overlay loading (seg020 for CIV.EXE)
    • whenever an overlay call is triggered (instruction INT 3Fh), MS-DOS forwards the call to the interrupt handler loaded at startup
    • the interrupt handler operates as follows:
      • read the next 3 bytes after the INT 3Fh instruction
      • interpret them as overlay number N (byte 1) and function offset <offset> in the overlay (byte 2 and 3)
      • read CIV.EXE from the disk and locate overlay N (overlay info data is stored in CIV.EXE as well by MSVC compiler, such as overlay sizes, overlay numbers, etc.)
      • import contents of overlay N into the dedicated segment (seg020), possibly overwriting any code previously loaded in there from another overlay
      • finally, call function at seg020:<offset>

    Every time an INT 3Fh occurs, CIV.EXE reads itself form the disk and imports code from the overlay into its dedicated overlay segment.
    This is a way to break the memory limitation for code size, at the expense of slow disk access.
    This is also why most overlay functions are not frequently used: map generation, load/save game, government reports (TRADE, DEFENSE, ...).
     
  6. MM1nd

    MM1nd Chieftain

    Joined:
    Oct 2, 2010
    Messages:
    8
    Oh I see. I was operating under the assumption, that the overlay handler was custom code.
     
  7. darkpanda

    darkpanda Dark Prince

    Joined:
    Oct 28, 2007
    Messages:
    604

    Indeed it is not custom: the concept of overlays was standard enough that MS-DOS specifications reserved interrupt 3F for overlay management, although not providing a default implementation.

    Borland compilers (Turbo C) insert their own flavour of overlay management in compiled EXEs, which is much better documented on the web.

    All in all the MSVC flavour was not too hard to decipher, though.
     
  8. RobinHood70

    RobinHood70 Chieftain

    Joined:
    Jul 3, 2013
    Messages:
    7
    Sorry to necropost to this thread, but I'm having this exact issue with another old DOS-based game I'm looking at, and seem to be missing something here. If I load the overlays as additional binary files the way you described, there's no relocation info loaded, so the loaded code is incorrect. I tried extracting the overlays to their own executables, but of course, that doesn't work either. If you're still around (and I see you were last active about a month ago), or if someone else has a good understanding of this, can you please describe the process in more detail. How do I load the code into a segment of the main EXE while still keeping the relocatable references correct? Thanks!
     
  9. darkpanda

    darkpanda Dark Prince

    Joined:
    Oct 28, 2007
    Messages:
    604
    I did not touch IDA nor CIV for more than a year (maybe even 2 years), so it's hard to give a definite answer... Could you describe your problem in more details ?
     
  10. fenugrec

    fenugrec Chieftain

    Joined:
    Oct 26, 2017
    Messages:
    3
    Gender:
    Male
    Sorry for poking this again, but I think there's some highly relevant information here, and more details to be discussed !
    I might have a bit to add regarding overlays, as well as some questions. I'm also involved analyzing a program compiled with overlays. Thanks to darkpanda for some very good hints! To go a bit further :

    - overlayed programs compiled with old Microsoft C compiler versions (4, 5, 6, 7, probably at least up to visual C 1.x ?) can use an overlay manager such as darkpanda described. I eventually found out that the source code for this manager was included in the VCC 1 release ! Google for "vccrt1.zip" which contains the file "OVLM6L.ASM" in MISC\DOS\.

    - the INT 3F calls are mostly as darkpanda described, but with a slight difference. The first byte after CD 3F is called the "segment number" and is not necessarily mapped 1-1 to the "overlay number" . There's actually a lookup table ("Segment number to overlay number" in the source code) to get the overlay number.
    Here's an example of what I mean, from my own project (not civ) :

    Code:
    int3F [segnumber] [offset]    => ovl_number (from LUT)
    CD 3F    3A    02 00    =>16
    CD 3F    3B    2C 01    =>16
    CD 3F    3B    06 00    =>16
    CD 3F    3C    00 00    =>17
    
    And of course the "overlay number" refers to the individual "chunks" (each with their own MZ header and relocations) packed at the end of the main .exe.
    I assume in CIV the mapping must have happened to be 1:1 but that would depend on how the .exe was linked originally. Presumably if multiple units were linked/combined per overlay (say, 2-3 source files) this might explain what I'm seeing -- perhaps each "segnumber" corresponds to one source file. Just speculation.

    - the ovl manager allows calls *between* overlays, which is nifty but makes analysis more complicated. It handles this by maintining its own call stack so that on calls / returns between overlays it can swap in the correct chunks. So, the best is to "flatten" the .exe like darkpanda did -- although it will no longer be a valid DOS .exe, it will serve perfectly for static analysis. There are some issues though if the whole thing goes beyond FFFF:FFFF (1MB address space) : IDA is quite happy to map stuff at 10000:0000 and higher, but the call fixups will be impossible. i.e. :
    Code:
    ;original opcode:
    CD 3F [segno] [offset_lo] [offset_hi]    ;this is "int 3F" followed by 3 bytes to identify destination
    
    ;change to a "call far" opcode
    9A [addr_l] [addr_h] [seg_l] [seg_h]    ;obviously can't point to a seg > 0xFFFF !
    

    - one issue I'm having (same as RobinHood70 mentioned above) : the overlays are meant to be loaded at a specific location.
    Code:
           layout in memory
           (when running)
    
    +---------------------------+ IMG_BASE
    |                           |
    |    main code (overlay 000)|
    |                           |
    |                           |
    +---------------------------+ OVL_BASE
      | overlays loaded here    |
      | (initially all 0x00)    |
      |                         |
      |                         |
    +---------------------------+ OVL_BASE + max_ovl_size
    |                           |
    |   rest of main code       |
    |   (ovl 000)               |
    |                           |
    |   also data segment etc.  |
    |                           |
    |                           |
    +---------------------------+ SS:0000
    |                           |
    |       stack               |
    |                           |
    +---------------------------+ initial SS:SP (top of stack)
    
    The overlay manager, when swapping in a new ovl, goes through its relocation table just as would happen when a regular .exe is loaded. The problem is : if we force the overlays to be loaded after the main image, like what happens in IDA when doing "load additional binary", the overlay's relocations are not parsed. So if for example the overlay calls functions in the main code (ovl_000), those cross-references will be invalid in IDA. Same thing for global/shared data.

    darkpanda : I'd be interested to hear more about how you dealt with this - perhaps it wasn't an issue in civ, or you were able to simply ignore the problematic refs ?

    RobinHood70: depending on darkpanda's answer, I have a plan to deal with the relocations manually, because it's a problem for my project. It would be to load the overlays at the end of the image (before the stack; I'd sabotage the stack ptr afterwards since it's not needed for static analysis). Then for the relocs, they would need to be adjusted individually depending on where they pointed to (before, inside, or after the overlay image). This could probably be done in an IDC script.
     
  11. darkpanda

    darkpanda Dark Prince

    Joined:
    Oct 28, 2007
    Messages:
    604
    Hi there,

    Unfortunately I really cannot spend time on your issue, but given your efforts I couldn't help reply...

    Quite frankly, I don't remember having specific issues with the cross-references you mention. I should check the CIV.EXE contents to make sure of this, but I am pretty sure that the relocation table and relocation entries in the code were already referencing the proper main code segments "as if" the overlays were inserted in the right code position (from memory, it was seg020 in CIV, I may be wrong).

    This being said, it is entirely possible that I manually "flagged" each relocation entry to map to the proper code segment - from memory, again, that would not have been a problem for me, I can almost remember the shortcut key to do this (was it CTRL-S ? Or CTRL-D ? ...)

    Not sure any of this was helpful, but I want you to know that I still regularly check out the forums, and I am still optimistic I will find time, someday, to spend significant effort on my unfinished works :)

    Cheers and good luck.

    DD
     
  12. fenugrec

    fenugrec Chieftain

    Joined:
    Oct 26, 2017
    Messages:
    3
    Gender:
    Male
    Awesome, thanks for checking in and taking the time to reply. Perhaps your choice of IDA options gave you better results than mine, too...
    In the meantime I hacked together a program to "unfold" an overlayed .exe into a monolithic, un-executable but analyzable .exe.
    https://github.com/fenugrec/overlazy

    It's almost finished, but there's trouble while replacing "int 3F" calls by "call far ..." : I forgot to add those calls to the reloc table ! Solution is fairly simple, in fixup_int3F() I just need to add a relocation entry for every occurence.
     
  13. darkpanda

    darkpanda Dark Prince

    Joined:
    Oct 28, 2007
    Messages:
    604
    Be careful that increasing the relocation table may also shift the starting offset of the code (impacts EXE header values).

    In fact I had written the exact same utility in my scrap IDE, but never bothered publishing it... Good job for the git :)
     
  14. fenugrec

    fenugrec Chieftain

    Joined:
    Oct 26, 2017
    Messages:
    3
    Gender:
    Male
    Oh yes, I already need to tweak the MZ headers so no big deal.
    The int 3F fixups are now working, and it's a fun sight to see all the xrefs come together automagically in IDA now P)

    I had to do one ugly trick to keep things "simple" - normally a sane .exe reserves stack space by setting its initial SS:SP pointer to somewhere beyond the image data, so the .exe is much smaller than the total memory requirement. The new .exe I create instead contains the original stack area, (i.e. as if it was initialized data), and sets the new SS:SP to point a few bytes after all the concatenated overlay image data, so that it looks like a legit stack in IDA.

    It looks a bit like this :
    Code:
       new "flattened" .exe
                                        file offset
    +---------------------------------+ 0
    | MZ header                       |
    +---------------------------------+ 0x1C
    | relocation table:               |
    |    - original OVL_000 relocs    |
    |    - adjusted OVL_XXX relocs    |
    |    - new relocs for call fixups |
    |                                 |
    +---------------------------------+ (hdr_parags * 0x10)
    |  main/root image (OVL_000)      |
    +---------------------------------+
    |  blank area (0x00 filled)       |
    |  (original stack area / BSS )   |
    |                                 |
    +---------------------------------+ (hdr_parags * 0x10) + (original SS:SP)
    |  concatenated images of         |
    |   OVL_001...OVL_XXX             |
    +---------------------------------+
    Good, now I can get to work doing actual RE ...
     
  15. tupi

    tupi Chieftain

    Joined:
    Jun 25, 2011
    Messages:
    71
    Location:
    Russia
    I understand that darkpanda probably won't see it, but I will try my luck.

    So, I have a problem with overlays.

    Addressing in the main program starts from 1000:0000. Let's look at some function call. For example, concatenation:
    9a 22 1e 45 20 CALLF FUN_strcat
    Everything is fine, in 2045:1e22 we actually have this function...
    But in overlays, everything is started from 0000:0000. So, for example, same concatenation func:
    9a 22 1e 45 10 CALLF FUN_wrong_place_for_strcat
    Of course, it calls totally wrong place in memory in result.
    And I have no idea what I can do with this. I'm interested if darkpanda ran into such problem. Maybe in IDA it can be solved somehow, I don't know. I can't find a way to fix it in Ghidra. Seems that only way to go is something like... I should fix all CALLF in all overlays (when it's still in binary) and only then feed them to Ghidra? So, I should write a script to search all these calls and then add 10 to last byte. Something like that.

    What I'm doing now I manually check every function and write correct function name in commentary. Fortunately, it's possible to go to function from its name in commentary. But these wrong calls create havoc in the main program - some incorrect functions in totally unexpected places.
     
    Last edited: Sep 4, 2020
  16. darkpanda

    darkpanda Dark Prince

    Joined:
    Oct 28, 2007
    Messages:
    604
    Hi tupi,

    I still regularly check the forum, mostly when I received alerts on multiple posts I subscribed to long ago :p

    So I see your problem, unfortunately I really don't know how to help you at the moment... Especially I did not try out Ghidra on CIV.EXE, so there could be special tricks or scripts to setup...

    Keep up the flame and good luck ! :)
     
  17. tupi

    tupi Chieftain

    Joined:
    Jun 25, 2011
    Messages:
    71
    Location:
    Russia
    Yay, thank you anyway. Always glad to see you. Btw, your explanation about 0x3f interrupts and overlays was very helpful.

    Actually, now I'm thinking I will simply search and replace all 9a calls manually one by one (there's usually about 5-10 different functions used in an overlay, and maybe about 50 functions used in all overlays in total) and after that I will add overlays to project. Yes, will do that.
     
    Last edited: Sep 4, 2020

Share This Page