Tutorial: CIV+IDA - Part 2

darkpanda

Dark Prince
Joined
Oct 28, 2007
Messages
823
This thread is PART 2 of a tutorial about reverse-engineering CIV.EXE with the help of IDA. You can directly access other parts using the links below:


Note that this tutorial is split between different posts/threads because of the limitation of this forum's 30,000 characters per post limitation

STEP 3: First bits of assembly



3.1. First assembly instruction: move!

In the previous step, we ended up creating our first cross-reference, between a well-know game string "...Break treaty...", and seemingly the only portion of the code that uses this string.

Now we're going to look at this portion of the code in more details:


The cross-reference we created is as follows (in Text view):

Rich (BB code):
seg011:0CD9 004    mov     ax, offset aCancelAction_B ; "!\n Cancel action.\n Break treaty.\n"

This line is an atomic assembly instruction:
  • We already know that seg011:0CD9 is the segmented address of the instruction
  • Let's forget about 004 for the moment (but if you're really curious, it is the current stack height)
  • Then there is mov which is the instruction's mnemonic, meaning a user-friendly way to remember what the instruction does
  • Afterwards, ax, offset aCancelAction_B are the instruction's operands
  • Finally, the line has an automatically generated comment which previews the string contents

Whichever way you start taking on assembly, the mov instruction will be one of the first you'll see: it's basically an assignment operation, meaning it assigns a value to... something.

In IDA, the direction of data between an instruction's operands is from right to left.
So it must be read: mov <destination>, <source>, which means "copy the value of <source> into <destination>".
Note: this convention is often called the Intel snytax, and follows most programming language conventions where variables initializations look like "a = 1".

Next, the <source> of our instruction is offset aCancelAction_B. Obviously, this means the offset of the value named aCancelAction_B - this is how IDA directly exposes the cross-reference to you. But in the actual code, as we saw before creating the cross-reference, this was just the plain hex offset value 2ADAh.
Such operands, which are plain hex values are called immediate values, meaning that the operand value is directly hard-coded in the instruction.

Finally, the <destination> of this instruction is ax... So, what is that?

All you really need to know is that ax is a register, which is a tiny memory area located inside the processor.

Also, don't get confused by the mnemonic 'mov': what's really happening is a copy of the value from <source> to <destination>, not a move.

3.2. Second assembly instruction: push and the stack

3.2.1. The Hex view

Let's now look at the next instruction:
Rich (BB code):
seg011:0CDC 004                 push    ax

We see that:
  • its segmented address is seg011:0CDC
  • the stack height is the same: 004
  • its mnemonic is push
  • it has a single operand: ax

First, we see that that this instruction's segmented address is 3 bytes after the previous instruction address:
Rich (BB code):
seg011:0CD9 + 3 = seg011:0CDC

This is a good opportunity to discover the Hex view of the code, by clicking on the tab Hex view A:


This brings us to the hex view of the current code, with the currently selected instruction's bytes highlighted:


We can do the same with the previous instruction, mov ax, offset aCancelAction_B, by selecting it in the text view:


... and then showing the hex view:


Here we can see 2 things:
  • Instructions can have a various number of bytes: mov ax, offset aCancelAction_B has 3 bytes (B8 DA 2A), while push ax has just 1 byte (50)
  • The currently highlighted instructions has all its bytes highlighted when selected; this helps make out groups of bytes belonging to individual instructions when looking at the hex view

Now let's look a little closer at the actual instruction: push ax

What does 'push <something>' do? The blunt answer is: it puts <something> at the top of the stack.

Quite clear isn't it? What is the 'stack', you ask? Well...

3.2.2. darkpanda's invaluable, public domain, yet very awkward, 'room metaphor' of processor architecure

Ok, I don't think there's any eay way around this one, so let's be creative... Consider this (very awkward) metaphor:
  • the computer's processor is as a very small room where at most 1 person can fit at any time.
  • it's the only room that has light, thus the only room where people can work, and the only room that has air (I said 'very awkward'), thus the only room where people can speak.
  • outside the room everything is black and empty, except for a light sign on the door who summons a person to come into the room

Now, all workers have a dedicated task, that they know how to perform considering given task requirements. For example:
  • One worker is an 'artist', he knows how to draw: just tell him what he should draw, and where he should draw it (on a given piece of paper)
  • Another worker is the 'art director': he doesn't know how to draw, but he decides all the things that should be drawn, and where they should be drawn

The art director is in the room: he needs something to be drawn, and for this he will need to call the artist... But how can he tell the artist what to draw and where when he cannot be in the same room at the same time or talk to him outside the room?
Well, here's what he does:
  • He writes down what must be drawn on a piece of paper, and puts the piece of paper on top of a paper pile
  • He also writes down where it should be drawn on a piece of paper, and puts the piece of paper on top of that very same a paper pile
  • He then lights on the door sign with the 'artist' name, for the artist to come in
  • Just before leaving, he puts a last piece of paper on the pile with his own name
  • Finally he leaves the room

Now the artist comes in the room:
  • He takes the first paper at the top of the pile, and puts it aside for later
  • He takes the next paper at the top of the pile, and by convention he knows that this is the position where he should start drawing
  • He then takes the next paper on the pile, and knows this is what he should draw
  • He draws...
  • When he's finished, he looks at the first paper he put aside before and, still by convention, knows he should summon into the room the person whose name is on the paper: he lights up his name on the room door
  • He leaves the room

The art director comes back into the room, and writes new pieces of paper for further things to draw, etc.

If you didn't understand the example above, feel free to comment or contact me privately.
Otherwise, let me clarify what the metaphor elements are:
  • Obviously, the room is the processor (somehow)
  • The workers, as you may have guessed, are program functions, also called sub-routines in assembly (and in IDA)
  • The pile of paper is, you've guessed it, the stack
  • Finally, the pieces of paper are, respectively, the function's arguments, and the caller ID

3.2.3. The stack

More formally, the stack is a memory area, whose main purpose is to implement the functional programming paradigm. In this paradigm, different parts of the program - the functions - interact with each other by exposing their signature (input arguments and output type) while hiding their internal mechanism from the other functions. In short, the stack is essentially used for function calls (not only, though, but never mind...)

Getting back to our assembly instruction, we saw that push <something> is used to put something at the top of the stack.
Conversely, getting what's currently on top of the stack is done using instruction pop <somewhere>, which actually puts the top-of-the-stack value, removesit from the stack, and puts it into <somewhere>.

To make more sense out of those mnemonics, you may as well see the stack as a spring-actioned plastic toy-gun: you push ammo through the barrel, arming the spring, and then you pop it with the trigger, the last ammo pushed in begin the first one to pop out...

Note that the name 'stack' was to chosen to illustrate this 'add on top'/'take from top' access mechanism, which is more formally known as 'Last In, First Out' or LIFO.

3.2.4. The CPU view of the stack

The x86 processor has an explicit notion of where the stack is in memory, through the values stored in 2 other registers:
  • ss, which stands for stack segment, is the segment base of the stack in memory
  • sp, which stands for stack pointer, is the offset of the top of the stack relative to the stack segment base (ss)

Note that sp is negative in general: adding values on the stack (with push) makes sp decrease, while removing values from the stack (with pop) makes sp increase.
This may look very strange, until you discover that the stack is located at the end of the program's memory, as shown below:


The white area between the program's data and the stack is unused memory, in which the stack can expand more or less freely as functions are being called, and as those calls add arguments on the stack. You may notice that if too many functions are called in a row, the stack may use up all the unused memory and reach the code memory... This can typically happen in case of programming bugs where endless function loop call themselves forever. This results in an error called, you guessed it, a stack overflow.

3.3. Gearing up with the 3 next assembly instructions: call a sub-routine!

Now we're going to look at the next 3 instructions:

Rich (BB code):
Bytes:         | Text view:
B8 DA 2A       | seg011:0CD9 004 mov     ax, offset aCancelAction_B
50             | seg011:0CDC 004 push    ax
B8 26 C9       | seg011:0CDD 006 mov     ax, 0C926h
50             | seg011:0CE0 006 push    ax
9A 26 1E 58 20 | seg011:0CE1 008 call    sub_223A6

We already know about the first 2 instructions, they're identical to previous 2 instructions: mov a value into ax, then push the value of ax on top of the stack.

You may notice that the current stack height has increased by 2, from 4 to 6, after the previous push: since ax is a 16-bit register (did I mention that?), it contains 2 bytes, and when pushing it onto the stack, the stack height thus increases by 2.
Similarly, after the second push, the stack height increases from 6 to 8.

Now what we're REALLY interested in is the final instruction:
seg011:0CE1 008 call sub_223A6

Since you have already mastered the 'room metaphor' above, you already guessed that the call instruction is performing a sub-routine call, otherwise known as a function call.

You can also see that IDA has a named cross-reference for the called sub-routine, sub_223A6, derived from the sub-routine absolute address. Let's double-click the sub-routine name to have a glance of where it takes us:


Let's not bother right now with this sub-routine detailed contents, however we're going to look at the calls to this sub-routine, by selecting the sub-routine name, and pressing CTRL+X:


You see that IDA already detected 249 different calls to this sub-routine - which is quite a lot! This gives us a hint that this sub-routine must be quite a generic low-level function...

For the sake, we'll just double-click randomly on one of the other calls to this sub-routine in the calls list:


Quite interestingly, we stumble upon the same assembly block as the one we were just analyzing... The only difference is the value of the immediate operand of the first mov instruction. Even the second mov has the same operand as before.

We're quite safe to deduct that every such assembly block is a call to a function that takes a string and 'something else' as its arguments.

Because the stack access mechanism is LIFO, the first argument of the called sub-routine is actually the last value to be pushed onto the stack. So in our situation, we have:
  • function sub_223A6
    • argument 1: 'something' (always 0C926h)
    • argument 2: offset to a string

From wherever you are, press ESC to get back to sub_223A6 (or double-click on the cross-reference), to have a second look at the sub-routine code:


You see that IDA did identify that the sub-routine has 2 arguments, represented at the beginning of the sub-routine. Note that this part of the disassembly is not part of the original code, but rather meta-code information generated by IDA for better readability.

From what we discovered in the last step, let's cross-reference other strings used as arguments to this function:

At address seg003:0706:
At address seg003:0C86:
At address seg003:139E:
At address seg007:4CB9:
At address seg007:4CDF:
Manually creating such cross-references is actually a large part of the analysis process.

STEP 4: A complete sub-routine



But what is this 0C926h, and more importantly, what is this function?

In this step, we're going to completely explain the subroutine we identified previously, as an exercice to learn a handful of assembly instructions.

First, let's have a look at the entire sub-routine again:
It is quite small, compared to other gigantic functions, but still it contains interesting material.

We see that its address is seg019:1E26. As a matter of fact, IDA determined that this is where the sub-routine started precisely because it found call instructions referring to this address.

We also see that its ending address is seg019:1E64: IDA determined it by following the control flow of the sub-routine, until it reached the typical final instruction of a sub-routine: retf.

We will see later that IDA is not always very efficient in finding the proper boundaries of sub-routines, and we often have to manually indicate where the end of a sub-routine is...

4.1. Stack frame management (part 1)

Let's look at the first two instructions, as well the last two instructions:
Rich (BB code):
Bytes:| Text view:
55    | seg019:1E26 000 push    bp
8B EC | seg019:1E27 002 mov     bp, sp

...

5D    | seg019:1E63 002 pop     bp
CB    | seg019:1E64 000 retf

In plain English it means:
  • push bp: put the value of 'bp' onto the stack
  • move bp, sp: copy the value of 'sp' into 'bp'
  • ...
  • pop bp: put the value at the top of the stack into 'bp'
  • retf: terminate function, i.e. go back to the calling function

First, you've guessed it: bp is another register featured in the x86 architecture. Its name stands for base pointer.

You will find those 4 instructions, 2 at the start and 2 at the end, in nearly every sub-routine. Quite often, there will be one additional instruction at the beginning of a sub-routine (sub sp, <xx>) but let's keep it for later.

The basic principle of the base pointer is that it is used by a sub-routine as its memory reference point: it is its 'base' to access the sub-routine's local memory, which contains its arguments, and possibly its local variables. We will see this in further instructions below.

This memory area is called the function's stack frame.

4.2. More registers

If we look at the 2 subsequent instructions in the sub-routine (as well at the 2 instructions before the last 2), we discover a handful of new registers:

Rich (BB code):
Bytes:| Text view:
55    | seg019:1E26 000 push    bp
8B EC | seg019:1E27 002 mov     bp, sp
8B D7 | seg019:1E29 002 mov     dx, di
8B DE | seg019:1E2B 002 mov     bx, si

...

8B F3 | seg019:1E5F 002 mov     si, bx
8B FA | seg019:1E61 002 mov     di, dx
5D    | seg019:1E63 002 pop     bp
CB    | seg019:1E64 000 retf

First up, we have 2 mov instructions that copy values between registers:

  • di gets copied into dx,
  • and si copied into bx.

Those 4 registers are very similar to ax by nature: they contain 16 bits (2 bytes) and can be manipulated freely by many instructions.

Also, di and si have an added privilege of being used in more specific ways by some instructions. Typically, string manipulation instructions will use si as the character position in the source string, and di as the character position in the destination string. This is what there names actually stand for:
  • si = source index
  • di = destination index
We will see more about this later on.

Then, looking at the last 4 instructions, we see that right before the end of the sub-routine, si and di recover the values they had before the sub-routine started:

  • dx gets copied back into di,
  • and bx copied back into si.
So in the current function, dx and bx are merely used as temporary backup storage for the values of si and di, for the duration of the sub-routine only.

We're only missing one last register, cx, to complete the happy family of the eight 16-bit general purpose registers of the x86 architecture:

  • ax
  • bx
  • cx
  • dx
  • bp = "base pointer", pointer to sub-routine's stack frame
  • si = "source index", source pointer for string operations
  • di = "destination index", destination pointer for string operations
  • sp = "stack pointer", pointer to the current top of the stack

Those are called "general purpose" in the sense that most instructions can modify their value or use them as they see fit, which is not the case of other registers, that we will see below.

4.2. Segment registers

Let's continue with the next 2 instructions:

Rich (BB code):
Bytes:| Text view:
55    | seg019:1E26 000 push    bp
8B EC | seg019:1E27 002 mov     bp, sp
8B D7 | seg019:1E29 002 mov     dx, di
8B DE | seg019:1E2B 002 mov     bx, si[/color]
8C D8 | seg019:1E2D 002 mov     ax, ds
8E C0 | seg019:1E2F 002 mov     es, ax
      | seg019:1E31         assume es:dseg
...
8B F3 | seg019:1E5F 002 mov     si, bx
8B FA | seg019:1E61 002 mov     di, dx
5D    | seg019:1E63 002 pop     bp
CB    | seg019:1E64 000 retf

Ha! Here are an additional 2 registers we didn't see before: ds and es... But there is something strange here: the first instruction copies ds into ax, and then the second instruction copies ax into es...

Why not directly copy ds into es?

Well, this what I stated above: ds and es are not general purpose registers, they are are segment registers.
As such, it is impossible to make direct copies between their values using mov. However, it is still possible to copy the value of one into the other by using an intermediate general purpose register such as ax.

They are called segment registers because their values are used by the processor as segment bases:

  • ds = data segment
  • es = extra segment

While we're at it, let's bring over the other 2 members of the good old 16-bit segment registers family, cs and ss:

  • cs = code segment
  • ss = stack segment

After previous explanations about the stack, you already know that ss contains the stack's segment base.

But what about the 3 others?

Well, their names certainly gives a clue:

  • the code segment register cs contains the segment base of the code being currently executed
  • the data segment register ds contains the segment base of the default data memory area to use for data operations
  • the extra segment register es contains the segment base of... well, an extra data memory area to use together with ds for multi-data operations
Getting back to our sub-routine, we now have es containing the same segment base as ds, namely dseg.

Tutorial continues in next post
 
Last edited:
4.3. Sub-routine arguments

We have seen before how the stack is used to pass arguments of function calls. The next instruction shows us how arguments are actually used in the sub-routine:

Rich (BB code):
Bytes:   | Text view:
55       | seg019:1E26 000 push    bp
8B EC    | seg019:1E27 002 mov     bp, sp
8B D7    | seg019:1E29 002 mov     dx, di
8B DE    | seg019:1E2B 002 mov     bx, si
8C D8    | seg019:1E2D 002 mov     ax, ds
8E C0    | seg019:1E2F 002 mov     es, ax
         | seg019:1E31         assume es:dseg
8B 7E 06 | seg019:1E31 002 mov     di, [bp+arg_0]
...
8B F3    | seg019:1E5F 002 mov     si, bx
8B FA    | seg019:1E61 002 mov     di, dx
5D       | seg019:1E63 002 pop     bp
CB       | seg019:1E64 000 retf

We already know the mov instruction, but here we see it in a slightly different shape: its second operand is no longer an immediate value, but a memory address.

Indeed, whenever you see brackets around something, like [something], this means the value stored in memory at address something.

Here the operand is: [bp+arg_0]

Earlier we saw that IDA already detected the arguments arg_0 and arg_2 as follows:
Rich (BB code):
seg019:1E26     arg_0           = word ptr  6
seg019:1E26     arg_2           = word ptr  8

IDA replaced, for better readability, the offset value 6 with named argument arg_0.
So this instruction could as well be written as below:
Rich (BB code):
Bytes:   | Text view:
8B 7E 06 | seg019:1E31 002 mov     di, [bp+6]
...

Now, how does IDA know that [bp+6] is a value in the stack?

In fact, in x86 assembly, there are conventions for matching offset registers with segment registers, as follows:
  • bp is an offset relative to the stack segment ss
  • si is an offset relative to the data segment ds
  • di is an offset relative to the extra segment es
  • sp is an offset relative to the stack segment ss

This means that [bp+6] can be explicitely written ss:[bp+6]

Now the question is: how does ss:[bp+6] match the sub-routine's first argument?

Well, we know from the sub-routine's first 2 instructions that bp contains the value that was in sp right after the sub-routine is called.

So let's go again through the steps of a calling the sub-routine, assuming an arbitrary value of 0FF2Dh for stack pointer sp:

  1. The offset of a string is copied into ax:
    Rich (BB code):
    B8 DA 2A       | seg011:0CD9 006 mov     ax, offset aCancelAction_B
  2. Then this same string offset is added to the stack, and will be the second argument of the called sub-routine:
    Rich (BB code):
    50             | seg011:0CEC 004 push    ax
  3. After adding a value to the stack, as described before, the stack pointer decrements by the number of bytes of the added value; so now, we have: sp = 0FF2Dh - 2 = 0FF2Bh
  4. Next, another value is copied into ax:
    Rich (BB code):
    B8 26 C9       | seg011:0CDD 006 mov     ax, 0C926h
  5. ... and this value is also added to the stack, and will be the first argument of the called sub-routine:
    Rich (BB code):
    50             | seg011:0CE0 006 push    ax
  6. After adding this second value to the stack, the stack pointer decrements again by the number of bytes of the added value: sp = 0FF2Bh - 2 = 0FF29h
  7. Next, the sub-routine is called:
    Rich (BB code):
    9A 26 1E 58 20 | seg011:0CE1 008 call    sub_223A6
  8. As we described earlier, the function call adds the caller ID to the stack: this means that both the current code's segment base and current code's offset are added to the stack as well:
    1. the current code's segment base is stored in 16-bit register cs, as we saw earlier; after adding it, the stack pointer decrements again: sp = 0FF29h - 2 = 0FF27h
    2. the current code's offset is stored in another 16-bit register, that we didn't see before: it is the ip register, which stands for instruction pointer; after adding it, the stack pointer decrements again: sp = 0FF27h - 2 = 0FF25h
  9. Finally, the first instruction of the sub-routine is to store the current value of bp on top of the stack:
    Rich (BB code):
    55       | seg019:1E26 006 push    bp

After all the above happens, here is what the stack looks like:


Then, from the above illustration, we can clearly see that ss:[bp+6] matches the memory location where the first argument 0C926h was stored.

4.4. Zeroing a register

Next instruction, xor:

Rich (BB code):
Bytes:   | Text view:
55       | seg019:1E26 000 push    bp
8B EC    | seg019:1E27 002 mov     bp, sp
8B D7    | seg019:1E29 002 mov     dx, di
8B DE    | seg019:1E2B 002 mov     bx, si
8C D8    | seg019:1E2D 002 mov     ax, ds
8E C0    | seg019:1E2F 002 mov     es, ax
         | seg019:1E31         assume es:dseg
8B 7E 06 | seg019:1E31 002 mov     di, [bp+arg_0]
33 C0    | seg019:1E34 002 xor     ax, ax
...
8B F3    | seg019:1E5F 002 mov     si, bx
8B FA    | seg019:1E61 002 mov     di, dx
5D       | seg019:1E63 002 pop     bp
CB       | seg019:1E64 000 retf

If you know a little about boolean algebra, you already now the boolean operation XOR, which stands for eXclusive OR.

A quick reminder: given two boolean values A and B, A XOR B is true if and only if A and B are different.
In other words:
  • false XOR false = false
  • true XOR false = true
  • false XOR true = true
  • true XOR true = false

The assembly instruction xor is the bitwise equivalent of this boolean operator, operating separately on all the operands' bits (forgive the abusive, barbaric notation):
  • xor 0, 0 = 0
  • xor 0, 1 = 1
  • xor 1, 0 = 1
  • xor 1, 1 = 0
  • xor 100, 101 = 001
  • xor 011010, 100111 = 111101
  • xor 10001, 10000 = 00001
  • xor 11111, 10000 = 01111
  • etc.

The assembly xor keeps a property from its boolean XOR counterpart: whatever values A and B, if A equals B, then xor A, B = 0

So obviously, the result of xor ax, ax is 0.

First: as in most assembly instructions, the operands should be read as <destination>, <source>. So the actual result of this instruction is that register ax now contains value 0.

Second: why use a counter-intuitive way like xor ax, ax in order to set ax value to 0?
We could have as well used mov ax, 0, which is much clearer to anyone looking at the code...
The reason partly lies in the code bytes for those 2 instructions:
Rich (BB code):
Bytes:   | Text view:
33 C0    | xor     ax, ax

B8 00 00 | mov     ax, 0

You've guessed it: mov ax, 0 takes more bytes than xor ax, ax, for the same purpose!

In the rest of the code, there are many places where counter-intuitive instructions, or sequence of instructions, are used to achieve an objective in fewer bytes than what the 'intuitive' way would yield.
This is essentially a question of program optimization: less code bytes equates to faster execution (not always true, but anyway...).

Another typical zero-ing of the ax register that you will see around is sub ax, ax, but we will come back to this one later...

4.5. String operations

Let's look at the next 2 instructions:

Rich (BB code):
Bytes:   | Text view:
55       | seg019:1E26 000 push    bp
8B EC    | seg019:1E27 002 mov     bp, sp
8B D7    | seg019:1E29 002 mov     dx, di
8B DE    | seg019:1E2B 002 mov     bx, si
8C D8    | seg019:1E2D 002 mov     ax, ds
8E C0    | seg019:1E2F 002 mov     es, ax
         | seg019:1E31         assume es:dseg
8B 7E 06 | seg019:1E31 002 mov     di, [bp+arg_0]
33 C0    | seg019:1E34 002 xor     ax, ax
B9 FF FF | seg019:1E36 002 mov     cx, 0FFFFh
F2 AE    | seg019:1E39 002 repne scasb
...
8B F3    | seg019:1E5F 002 mov     si, bx
8B FA    | seg019:1E61 002 mov     di, dx
5D       | seg019:1E63 002 pop     bp
CB       | seg019:1E64 000 retf

No mystery on the first instruction, it stores immediate value 0FFFFh into register cx

The next instruction though, looks far more exotic: repne scasb

What the hell is that!

There's a lot going on here, so let's try and break it down into the basics: this instruction has actually 2 different parts:
  • repne: this an instruction prefix, which essentially means "repeat the following instruction while cx is not 0 AND 'not equal', and decrement cx"
  • scasb: this instruction is short for "scan string byte"; if you read its documentation, it says: "Compare al with byte at es:di, set status flags, then increment di"

Here we have a handful of things to address, so let's see them one by one:

4.5.1. 8-bit registers

al is a new register we didn't mention before... Or didn't we?

Actually we already know about it, indirectly: al is an 8-bit register constituting the low-order byte of register ax.

Similarly, ah is the 8-bit register constituting the high-order byte of register ax.

You will encounter the equivalent pairs of 8-bit registers for all 4 general purposes registers bx, cx and dx:
  • ax = ah:al
  • bx = bh:bl
  • cx = ch:cl
  • dx = dh:dl

That's it for 8-bit registers (I hope it's enough and won't get back biting at me...).

4.5.2. flags register

Another register we didn't talk about is the flags register: this is a 16-bit register just like ax, bp, es or ip, except that it is used for the meaning of its individual bits.

Each bit of the flags register has a specific meaning, which holds a boolean information (1 or 0, true or false) about the result of the previous operation. Thus, each bit of the flags register has its own individual 'flag' name:
  • bit 0 = cf: stands for carry flag, which is set to 1 when an addition generated a carry-over, or a substraction generated a borrow-from; 0 otherwise
  • bit 2 = pf: stands for parity flag, which is set to 1 when the number of 1's is even in the result of the last operation
  • bit 4 = af: stands for adjust flag, which is set to 1 when an addition generated a mid-value carry-over, or a substraction generated a mid-value borrow-from; mid-value means that the carry/borrow occured between the 2 bytes of the 16-bit result
  • bit 6 = zf: stands for zero flag, which is set to 1 when the last operation's result was 0;
  • bit 7 = sf: stands for sign flag, which is set to 1 when the last operation's result was negative; in "two's complement" binary notation, this is equivalent to say that the result's high-order bit was 1
  • bit 8 = tf: stands for trap flag, which is used for instrumentating the CPU (goes into single-step mode, useful for debugging, not our concern here)
  • bit 9 = if: stands for interrupt flag, which is used for handling interrupts (not really our concern here, but see below)
  • bit 10 = df: stands for direction flag, which determines the direction of string operations; if 1, SI and DI registers are incremented when a string operation occurs, otherwise they are decremented
  • bit 11 = of: stands for overflow flag, which is used to indicate that the result of an operation could not fit inside a 16-bit value; its actual meaning and operation is a bit complex, and not our concern here, so I'll just skip it

You don't need to remember all of this right now, but it's good to have a reference at hand. The most important flags, that will often need to deal with, are cf, zf and sf. Secondarily important would be of and if.

4.5.3. Comparisons

As mentioned in the description of this instruction, there are several comparisons occuring in this instruction:
  • repne compares the value of cx against value 0
  • scasb compares the value of al against the byte value, in memory, at address es:di

In assembly, there are many other places and shapes where comparisons occur (we will see later on), but they all come down to the same principle:
  • to compare X and Y, do:
  • compute the substraction X - Y, as in X minus Y
  • then check the result:
    • if the result is positive, that means X > Y, as in X is greater than Y
    • if the result is zero, that means X = Y, as in X is equal to Y
    • if the result is negative, that means X < Y, as in X is smaller than Y

This principle is rendered very convenient because checking the result can entirely rely on the zero flag zf and the sign flag sf:
  • if the result is zero, that means zf is 1, then X = Y
  • if the result is positive, that means zf is 0 and sf is 0, then X > Y
  • if the result is positive, that means zf is 0 and sf is 1, then X < Y

4.5.4. Back to repne scasb

With everything we just saw, let's look in details at what our instruction is going to do:

  • We saw that repne means "repeat until cx = 0 AND while 'not equal', and decrement cx"
  • Now we know that 'not equal' should be understood as 'zero flag zf is 0'
  • Next, scasb compares al with byte at es:di, sets status flags, then increments di
  • Again, we know that 'sets status flags' means:
    • set zf to 1 if al equals [es:di]
    • set zf to 0 if al is different from [es:di]
  • So if re-inject this last statement in the repne clause above, we can again re-phrase the instruction as: "repeat until cx = 0 AND while al is different from [es:di], and decrement cx, and increment di"
  • Grammatically, we can make this sentence even more straightforward and to the point:
    repne scasb means: decrement cx and increment di until cx = 0 OR al equals [es:di]

4.5.5. Actual result

Now let's look at the various registers involved:
  • cx was initialized to 0FFFFh, which is 65536 in decimal
  • ax was initialized to 0, and thus al is 0 as well
  • es was initialized to dseg
  • di was initialized to [bp+arg_0]

If we replace the actual values in the previous sentence, we get:
repne scasb means: decrement cx starting at 65535 and increment di starting at arg_0 until cx = 0 OR [es:di] equals 0 (al)

In pseudo-C code, this would be:
Rich (BB code):
int cx = 65535;
int di = arg_0;
while((cx--)!=0 && es[(di++)]!=0);

So when does this loop end? Either when it iterated 65535 times, in which case cx becomes 0, OR on the first time it the byte value at [es:di] is 0, starting at arg_0... Remember this for the next steps.

4.5.6. All repeat prefixes

For the record, here is the list of possible prefixes to string operations:
  • rep: means "repeat", iterates until cx = 0
  • repe: means "repeat if equal", iterates until cx = 0, and while zf = 1
  • repne: means "repeat if not equal", iterates until cx = 0, and while zf = 0
  • repz: means "repeat if zero", same as repe
  • repnz: means "repeat if not zero", same as repne

4.6. Re-addressing: lea

I must confess I am not very familiar with the next instruction, lea:

Rich (BB code):
Bytes:   | Text view:
55       | seg019:1E26 000 push    bp
8B EC    | seg019:1E27 002 mov     bp, sp
8B D7    | seg019:1E29 002 mov     dx, di
8B DE    | seg019:1E2B 002 mov     bx, si
8C D8    | seg019:1E2D 002 mov     ax, ds
8E C0    | seg019:1E2F 002 mov     es, ax
         | seg019:1E31         assume es:dseg
8B 7E 06 | seg019:1E31 002 mov     di, [bp+arg_0]
33 C0    | seg019:1E34 002 xor     ax, ax
B9 FF FF | seg019:1E36 002 mov     cx, 0FFFFh
F2 AE    | seg019:1E39 002 repne scasb
F2 AE    | seg019:1E3B 002 lea     si, [di-1]
...
8B F3    | seg019:1E5F 002 mov     si, bx
8B FA    | seg019:1E61 002 mov     di, dx
5D       | seg019:1E63 002 pop     bp
CB       | seg019:1E64 000 retf

If you look at its documentation, it says that lea stands for "load effective address", which stores the offset of a memory element into a register.

Here we have: lea si, [di-1]

So my understanding is that after this instruction, register si contains, actually, di-1...

I don't know what more to add here, really, so let's continue!

4.7. String operations, again

Now we look at the next 3 instructions:

Rich (BB code):
Bytes:   | Text view:
55       | seg019:1E26 000 push    bp
8B EC    | seg019:1E27 002 mov     bp, sp
8B D7    | seg019:1E29 002 mov     dx, di
8B DE    | seg019:1E2B 002 mov     bx, si
8C D8    | seg019:1E2D 002 mov     ax, ds
8E C0    | seg019:1E2F 002 mov     es, ax
         | seg019:1E31         assume es:dseg
8B 7E 06 | seg019:1E31 002 mov     di, [bp+arg_0]
33 C0    | seg019:1E34 002 xor     ax, ax
B9 FF FF | seg019:1E36 002 mov     cx, 0FFFFh
F2 AE    | seg019:1E39 002 repne scasb
F2 AE    | seg019:1E3B 002 lea     si, [di-1]
8B 7E 08 | seg019:1E3E 002 mov     di, [bp+arg_2]
B9 FF FF | seg019:1E41 002 mov     cx, 0FFFFh
F2 AE    | seg019:1E44 002 repne scasb
...
8B F3    | seg019:1E5F 002 mov     si, bx
8B FA    | seg019:1E61 002 mov     di, dx
5D       | seg019:1E63 002 pop     bp
CB       | seg019:1E64 000 retf

Do you notice? Those are almost the same as 3 of the instructions we saw before!

The only difference is that now, we're dealing with arg_2 instead of arg_0.

This is actually giving us a very useful information about arg_0: since it is manipulated in the same fashion as arg_2, and we know that arg_2 is the memory address of a string, we can safely assume that arg_0 is a string address as well!

4.7.1. Naming your cross-references

We will celebrate this finding by going back to the code that called the sub-routine, and create a cross-reference for this previously unknown argument: we go ESC to get back to the caller, click on the value, then press Alt+R and select dseg:


You see that IDA automatically gives a name to this data element, using its absolute address: byte_36AD6.

This name is really too vague to be useful, so we will give it a much more telling name such as unknown_string?: click on the cross-reference, and then press N:


Tutorial continues in next post
 
Last edited:
4.8. One more handful of instructions: not, sub and xchg

Now we look again at the next 3 lines, which bring 3 new instructions, which are fortunately quite straightforward to understand:

Rich (BB code):
Bytes:   | Text view:
55       | seg019:1E26 000 push    bp
8B EC    | seg019:1E27 002 mov     bp, sp
8B D7    | seg019:1E29 002 mov     dx, di
8B DE    | seg019:1E2B 002 mov     bx, si
8C D8    | seg019:1E2D 002 mov     ax, ds
8E C0    | seg019:1E2F 002 mov     es, ax
         | seg019:1E31         assume es:dseg
8B 7E 06 | seg019:1E31 002 mov     di, [bp+arg_0]
33 C0    | seg019:1E34 002 xor     ax, ax
B9 FF FF | seg019:1E36 002 mov     cx, 0FFFFh
F2 AE    | seg019:1E39 002 repne scasb
F2 AE    | seg019:1E3B 002 lea     si, [di-1]
8B 7E 08 | seg019:1E3E 002 mov     di, [bp+arg_2]
B9 FF FF | seg019:1E41 002 mov     cx, 0FFFFh
F2 AE    | seg019:1E44 002 repne scasb
F7 D1    | seg019:1E46 002 not     cx
2B F9    | seg019:1E48 002 sub     di, cx
87 FE    | seg019:1E4A 002 xchg    di, si
...
8B F3    | seg019:1E5F 002 mov     si, bx
8B FA    | seg019:1E61 002 mov     di, dx
5D       | seg019:1E63 002 pop     bp
CB       | seg019:1E64 000 retf

4.8.1. Iteration count with not

First, not is quite straightforward: it performs a bitwise NOT boolean operation on all its operand's bits.

There is something worth noting about using not in assembly: because the binary notation used in x86 assembly for negative values is [wiki]two's complement[/wiki], the result of a not instruction has an interesting property: NOT A is equal to -A-1 (as in minus A minus 1) for whatever value A, positive or negative.

For example, the result of not 0FFFFh would be 0. Indeed, in [wikipedia]two's complement[/wikipedia] notation, 0FFFFh is the 16-bit equivalent to the signed value -1. Its unsigned value is 65535. So not 0FFFFh is equivalent to -(-1)-1 = 1-1 = 0.

In the previous repeated instruction loop, cx is progressively decremented: 0FFFFh, 0FFFEh, 0FFFDh, 0FFFCh, etc.

If we look at the signed value of cx, it goes: -1, -2, -3, -4, etc. (while its unsigned value goes: 65535, 65534, 65533, 65532, etc.)

So, when the loop is over, after N iterations, the signed value of cx is -1-N (while its unsigned value is 65535-N).

Then, performing not cx will change the signed value of cx to -(-1-N)-1 = 1+N-1 = N.

So the bottom line here is: when using iterated string operations using rep* prefixes, you can get the number of iterations in cx by executing not cx right afterwards.

Pretty neat, isn't it?

And it is even neater if you think about it a little further: the iterated string operations, executed for both arg_0 and arg_2, terminate as soon as a '0' byte is encountered. Now we know that arg_0 and arg_2 are offsets to strings, so we can deduct with some confidence that the first '0' byte of a string marks the end of the string.

In other words: after this instruction, cx contains the length of the string whose offset is arg_2.

4.8.2. Some basic math, finally, with sub

The next instruction sub is, again, quite straightforward: it performs a subtraction between its 2 operands, storing the subtraction result into the destination operand.

So: sub A, B should be understood as, in pseudo-C notation: A = A - B; (or also A -= B;).

Here we have sub di, cx, so what happens is di = di - cx;

Wait a minute here... di was also modified during the repeated instruction repne scasb: its value started at [bp+arg_2] and was incremented at each iteration.

So if we list the values of di and cx for the first N iterations, here is what we have:
iteration| cx | di
0| 0FFFFh = -1|[ bp+ arg_2 ]
1| 0FFFEh = -2|[ bp+ arg_2 ]+1
2| 0FFFDh = -3|[ bp+ arg_2 ]+2
3| 0FFFCh = -4|[ bp+ arg_2 ]+3
...|...|...
N| 0FFFFh-N = -N-1|[ bp+ arg_2 ]+N

Interesting... Let's re-write this list of values, now adding not cx and sub di, cx:

iteration| cx | di | cx = not cx | sub di, (not cx)
0| 0FFFFh = -1|[ bp+ arg_2 ]|-(-1)-1 = 0|[ bp+ arg_2 ]-0 = [ bp+ arg_2 ]
1| 0FFFEh = -2|[ bp+ arg_2 ]+1|-(-2)-1 = 1|[ bp+ arg_2 ]+1-1 = [ bp+ arg_2 ]
2| 0FFFDh = -3|[ bp+ arg_2 ]+2|-(-3)-1 = 2|[ bp+ arg_2 ]+2-2 = [ bp+ arg_2 ]
3| 0FFFCh = -4|[ bp+ arg_2 ]+3|-(-4)-1 = 3|[ bp+ arg_2 ]+3-3 = [ bp+ arg_2 ]
...|...|...|...|...
N| 0FFFFh-N = -N-1|[ bp+ arg_2 ]+N|-(-N-1)-1 = N|[ bp+ arg_2 ]+N-N = [ bp+ arg_2 ]

Ok, so after doing sub di, cx, the value di is... [bp+arg_2], which was its initial value before all that!

Again you may wonder why such a convoluted way to assign [bp+arg_2] to di, instead of just doing a plain simple mov di, [bp+arg_2]?

Well the reason here again lies in the bytes:
  • the mov version uses up 3 bytes, while the sub version uses up 2 bytes only
  • it's true that it only works because of the previous not cx instruction, which also takes 2 bytes...
  • so that would be 4 bytes altogether, which is more than 3 bytes for a single mov
  • but with those 4 bytes, you get more than the mov alone, because you also get the string length in cx as we saw earlier... and you will see that it is of further use in the next instructions.

4.8.3. Value swap with xchg

The next instruction xchg, which stands for 'exchange', is also quite simple to understand: it swaps the values of its 2 operands.

That is to say that after xchg di, si, the register di contains what was in si just before, and vice-versa, si contains what was in di.

4.9. Jump! Jump! Jump!

While we are nearing on the end of analyzing our sub-routine, in the next 6 instructions we find a fundamental assembly instruction, the conditional jump:

Rich (BB code):
Bytes:      | Text view:
55          | seg019:1E26 000 push    bp
8B EC       | seg019:1E27 002 mov     bp, sp
8B D7       | seg019:1E29 002 mov     dx, di
8B DE       | seg019:1E2B 002 mov     bx, si
8C D8       | seg019:1E2D 002 mov     ax, ds
8E C0       | seg019:1E2F 002 mov     es, ax
            | seg019:1E31         assume es:dseg
8B 7E 06    | seg019:1E31 002 mov     di, [bp+arg_0]
33 C0       | seg019:1E34 002 xor     ax, ax
B9 FF FF    | seg019:1E36 002 mov     cx, 0FFFFh
F2 AE       | seg019:1E39 002 repne scasb
F2 AE       | seg019:1E3B 002 lea     si, [di-1]
8B 7E 08    | seg019:1E3E 002 mov     di, [bp+arg_2]
B9 FF FF    | seg019:1E41 002 mov     cx, 0FFFFh
F2 AE       | seg019:1E44 002 repne scasb
F7 D1       | seg019:1E46 002 not     cx
2B F9       | seg019:1E48 002 sub     di, cx
87 FE       | seg019:1E4A 002 xchg    di, si
8B 46 06    | seg019:1E4C 002 mov     ax, [bp+arg_0]
F7 C6 01 00 | seg019:1E4F 002 test    si, 1
74 02       | seg019:1E53 002 jz      short loc_223D7
A4          | seg019:1E55 002 movsb
49          | seg019:1E56 002 dec     cx
            | seg019:1E57
            | seg019:1E57     loc_223D7:              ; CODE XREF: sub_223A6+2Dj
D1 E9       | seg019:1E57 002 shr     cx, 1
...
8B F3       | seg019:1E5F 002 mov     si, bx
8B FA       | seg019:1E61 002 mov     di, dx
5D          | seg019:1E63 002 pop     bp
CB          | seg019:1E64 000 retf

Let's quickly skip over the mov ax, [bp+arg_0], that we already master now (hopefully!): it copies the offset of the first string argument into ax.

The next instruction test is new to us: it performs a bitwise AND on its 2 arguments, changing the flags but without storing the result.

Practically speaking, test si, 1 is testing whether the lowest bit of si is 1. This is equivalent to checking whether si is even (can be divided by 2).

The result of the test will only affect the flags: most notably, if the result of the bitwise AND is zero, then the zero flag zf will be set to 1, otherwise it is set to 0.

Why is this important? Well, because of the next instruction:
jz short loc_223D7

Let's go straight to the point: jz stands for "jump if zero". It is a conditional jump.

In longer, plain English, what happens in a conditional jump is: "the program execution must go ('jump') directly to the instruction indicated by the operand if <condition> is met; otherwise, the program continues by executing the instruction right after the 'jump' instruction".

In our case, the operand is short loc_223D7, and the <condition> is 'zf is set to 1'.

From the instruction block we are analyzing, we see that the operand short loc_223D7 is referring to a location just a few bytes ahead, at seg019:1E57. So the jump will only skip 2 instructions if the condition is met.

These 2 instructions are:
  • movsb:
    • its unusual, operand-less form should give you a hint that it is a string operation
    • it stands for "mov string byte"
    • it moves (= copies) the byte value at ds:si (source) to the byte location at es:di (destination), and increments si and di
  • dec cx:
    • it stands for "decrement cx (by 1)"

All in all, if we consider those 6 instructions together, they are equivalent to the following pseudo-C code:
Rich (BB code):
mov ax, [bp+arg_0]

if( (si&1) != 0 ) {
    es[di++] = ds[si++];
}

shr cx, 1

Or the following plain english: "if si is odd, copy the byte at es:di to ds:si, then increment si, di and decrement cx"

Let's take a minute to see what's actually happening, by considering the values of involved registers as instructions have unfolded so far:

instruction| ds | si | es | di | cx |Comment
mov ax, ds | dseg |?|?|?|?|Copying ds to ax...
mov es, ax |dseg|?| dseg |?|?|...then ax to es
mov di, [bp+arg_0] |dseg|?|dseg| [bp+arg_0] |?|di now contains offset of first string argument (the unknown string)
xor ax, ax |dseg|?|dseg|[bp+arg_0]|?|ax set to 0
mov cx, 0FFFFh |dseg|?|dseg|[bp+arg_0]| -1 |cx set to 0FFFFh = -1 (or 65535)
repne scasb |dseg|?|dseg| [bp+arg_0]+N | -1-N |The first string argument (arg_0) is scanned to find the value 0, which marks its end: now di points to the end of string arg_0, while cx is -1-N, where N is the length of arg_0
lea si, [di-1] |dseg| [bp+arg_0]+N-1 |dseg|[bp+arg_0]+N|-1-N|si is now equal to di-1
mov di, [bp+arg_2] |dseg|[bp+arg_0]+N-1|dseg| [bp+arg_2] |-1-N|di now contains offset of second string argument (the known string, "Break treaty...")
mov cx, 0FFFFh |dseg|[bp+arg_0]+N-1|dseg|[bp+arg_2]| -1 |cx set to 0FFFFh = -1 again
repne scasb |dseg|[bp+arg_0]+N-1|dseg| [bp+arg_2]+M | -1-M |The second string argument (arg_2) is scanned to find the value 0, which marks its end: now di points to the end of string arg_2, while cx is -1-M, where M is the length of arg_2
not cx |dseg|[bp+arg_0]+N-1|dseg|[bp+arg_2]+M|-(-1-M)-1 = M |cx now contains M, the length of arg_2
sub di, cx |dseg|[bp+arg_0]+N-1|dseg| [bp+arg_2] |M|di is set back to the offset of arg_2
xchg di, si |dseg| [bp+arg_2] |dseg| [bp+arg_0]+N-1 |M|di and si values are swapped
mov ax, [bp+arg_0] |dseg|[bp+arg_2]|dseg|[bp+arg_0]+N-1|M|ax set to offset of first string argument (arg_0)
test si, 1 |dseg|[bp+arg_2]|dseg|[bp+arg_0]+N-1|M|checks whether si, that is [bp+arg_2], the offset of the second string argument, is odd

So the values of si and di are now:
  • si = [bp+arg_2] -> offset of the first byte of string arg_2
  • di = [bp+arg_0]+N-1 -> offset of the last byte of string arg_0 (the 'zero' byte)

Consequently, if si is actually odd, 1 byte is copied from the beginning of string arg_2 to the end of string arg_0, overwriting its last byte '0'... And then both si and di are incremented, and cx, which contains the length of arg_2, is decremented...

More to the point: si and di are incremented, ready to copy the next byte, and cx, which contains the length of si, is decremented indicating the number of remaining bytes to copy...

Do you feel what's starting to happen here? If you think "string copy" you're quite right... You're even righter if you think "string append".

4.10. Bit-shifting and divisions

But let's not get too much ahead of ourselves, and look at the next couple instructions:

Rich (BB code):
Bytes:      | Text view:
55          | seg019:1E26 000 push    bp
8B EC       | seg019:1E27 002 mov     bp, sp
8B D7       | seg019:1E29 002 mov     dx, di
8B DE       | seg019:1E2B 002 mov     bx, si
...
F7 C6 01 00 | seg019:1E4F 002 test    si, 1
74 02       | seg019:1E53 002 jz      short loc_223D7
A4          | seg019:1E55 002 movsb
49          | seg019:1E56 002 dec     cx
            | seg019:1E57
            | seg019:1E57     loc_223D7:              ; CODE XREF: sub_223A6+2Dj
D1 E9       | seg019:1E57 002 shr     cx, 1
F3 A5       | seg019:1E59 002 rep movsw
...
8B F3       | seg019:1E5F 002 mov     si, bx
8B FA       | seg019:1E61 002 mov     di, dx
5D          | seg019:1E63 002 pop     bp
CB          | seg019:1E64 000 retf

First we have shr, which stands for "shift right": this is a bitwise right-shift operation, meaning that all of its first operand's bits are shifted (moved) to the location at their right, by an amount described by the second operand.

In pseudo-C, shr cx, 1 is equivalent to: cx = cx>>1; or cx >>= 1;

If you are not familiar with bit-shifting operations, here is a good opportunity to note the following:
  • In bit notation (or binary notation), shifting bits by 1 position is equivalent to multiplying (left) or dividing (right) by 2:
    • 57h = 01010111 = 87
    • 57h << 1 = 10101110 = 174 = 87*2
    • 57h >> 1 = 00101011 = 43 = 87/2 (integer division)

So our instruction shr cx, 1 is equivalent to: cx = cx/2;

But why divide cx by 2? The answer comes in the next instruction: rep movsw

This instruction combines multiple elements we have seen before, with slight differences:
  • Similar to the earlier repne prefix, the rep prefix means "repeat until cx is 0"
  • Similar to the earlier movsb prefix, the movsw prefix stands for "move string word" (instead of byte); it moves (= copies) the word value at ds:si (source) to the word location at es:di (destination), and increments si and di by 2 (instead of 1)

You get the idea, right?

Since a word is a 2-byte value, what those 2 instructions are doing is: "append cx/2 times 2-byte words from arg_2 to the end of arg_0"

Which is equivalent to: "append cx times 1-byte bytes from arg_2 to the end of arg_0".

Well it is almost equivalent, but not totally, as we see in the next section...

4.11. The final byte in the carry

What happens if cx contains an odd value?

Then the integer division cx/2 will only yield the total count of 2-byte words fully contained in cx, but cx being odd, there will be one last remaining byte not accounted for.

For instance:
  • If cx is 6Eh, which is 110 in decimal notation, which is even, then cx/2 is 55: we find that 55 2-byte words do properly correspond to 110 total bytes, which was the initial value of cx
  • But if cx is 57h, which is 87 in decimal notation, which is odd, then cx/2 is 43: in this case, 43 2-byte words, which are totally 86 bytes, do not correspond to the initial value of 87 in cx

Fortunately, x86 assembly can help with this situation, because when executing shr cx, 1, it stores the discarded bit in the carry flag cf.

Now, all that remains is to copy the last byte, IF the value of cf is 1, which is done with 2 remaining instructions:

Rich (BB code):
Bytes:      | Text view:
55          | seg019:1E26 000 push    bp
8B EC       | seg019:1E27 002 mov     bp, sp
8B D7       | seg019:1E29 002 mov     dx, di
8B DE       | seg019:1E2B 002 mov     bx, si
...
F7 C6 01 00 | seg019:1E4F 002 test    si, 1
74 02       | seg019:1E53 002 jz      short loc_223D7
A4          | seg019:1E55 002 movsb
49          | seg019:1E56 002 dec     cx
            | seg019:1E57
            | seg019:1E57     loc_223D7:              ; CODE XREF: sub_223A6+2Dj
D1 E9       | seg019:1E57 002 shr     cx, 1
F3 A5       | seg019:1E59 002 rep movsw
13 C9       | seg019:1E5B 002 adc     cx, cx
F3 A4       | seg019:1E5D 002 rep movsb
8B F3       | seg019:1E5F 002 mov     si, bx
8B FA       | seg019:1E61 002 mov     di, dx
5D          | seg019:1E63 002 pop     bp
CB          | seg019:1E64 000 retf

The instruction adc stands for "add with carry": it performs the addition of its two operands, in addition to the value of the carry flag cf.

After the previous instruction rep movsw, we know that the value of cx is 0, since it is decremented for each iteration until it becomes 0.

So adc cx, cx is actually equivalent to: cx = cx + cx + cf = 0 + 0 + cf = cf:
  • If the value of cf is 0: this means that shr cx, 1 discarded a '0' bit, which was the LSB of cx before shifting, which means that cx was even -> cx is now 0
  • If the value of cf is 1: this means that shr cx, 1 discarded a '1' bit, which was the LSB of cx before shifting, which means that cx was odd -> cx is now 1

So the next instruction rep movsb is copying a last byte if cx was odd before right-shifting, which is exactly what is needed to copy the possible last byte.

Tutorial continues in next post
 
Last edited:
4.12. Conclusion

Wrapping up all we saw before, we are now confident that this sub-routine appends its arg_2 string argument at the end of its arg_0 string argument.

This operation is typically used to concatenate small pieces of strings into a larger string buffer.

This why we can now rename the sub-routine to something more telling, like appendStringToBuffer:


And rename the 2 arguments as well, like arg_buffer and arg_string:



There are 2 final things you may have wondered, that were not answered before:
  • What is the purpose of the instruction mov ax, [bp+arg_0]?
    • The quick answer to this is that ax is used, by convention, to store the return value of the sub-routine
    • So this sub-routine, after appending a string to a buffer, returns the offset of the buffer, in ax
  • Why did we bother to copy just 1 byte if arg_2 was odd, before copying full words, and then copy 1 byte again if cx was odd? (the whole conditional jump thing with test si, 1)
    • The quick answer to this is alignment: because we're dealing with a 16-bit architecture, full 2-byte words can only be accessed every 2 bytes, which means at even addresses only
    • In other words, if si was odd, it needs to be re-aligned to an even value first, before executing word operations like movsw

This concludes the analysis of this sub-routine.

Incidentally, you may want to look at the next couple of sub-routines right after this one, and try to figure out what they do.

If you followed this tutorial entirely, it should be quite simple, because they only differ from the sub-routine we analyzed by a few instructions: they are all strings-related utility functions.

Anyhow, thanks a bunch if you read it this far, and keep an eye on the next installment - but be patient ;)

Cheers!
 
Last edited:
I'm impressed with the thoroughness of these installments. They must have taken a long time to write, and I bet we are not a lot of us who finds it interesting so I appreciate your work :)

I had named this subroutine "somethingWithStrings?" in my IDA project. So I had correctly identified it did some kind of string manipulation but not exactly what.

Another great installment Darkpanda! I'm looking forward to the next. Specifically I'm interested in figuring out where IDA fails to to do code analysis and why, and how to re-analyse these parts. Somewhere in the code it seems to call a null pointers, which I am guessing is because the data has not been analysed as code.
 
Your description "calls to null subs" rather sounds like overlay related issues, but it would be interesting to have more details, like what's your offset?

Incidentally I am quite happy because I just figured out a few days ago how external "driver" code is loaded into the main code, like MGRAPHIC.EXE or ASOUND.CVL... It could be linked to your issue.

But besides this, I will try to find a good example of undecoded code.
 
Ooooh, I'm glad to see this! It's too late to delve into this tonight, but when I'm better-rested and sunrise is farther off, I'll certainly be reading through. :thumbsup:
 
What you achieved is really amazing. I must say, I never analyzed code to such depths (I saw your explanations of map generation process - fascinating). My interests were somewhat more limited - some cheating (but always for all sides ;) ) and maybe some "corrections".
I also find debugging very useful technique - it is easier to analyze the code if one sees what happens to variables and sees memory "context". For this I use Cheat Engine + DOSBox Debug, although if someone found a way to debug things with IDA, that would probably quadruple "productivity".

With debugging under DOSBox I was able to correct (or evade) some annoying features of AI behaviour. For example AI never irrigates/changes tiles other than those which gives 1 food bonus, and what follows never clear Jungles, Forests and Swamps. Cheat engine allowed me to "simulate" Breakpoint on read, which DOSBox Debug severely lacks and I found why this is happening.

All in all, I must say this IDA is pretty powerful stuff (in proper hands :) ). One may hope someone will use it to correct some nasty bugs in Master of Magic (another of my favourites). ;)
 
I know IDA has some feature to be linked with a debugger, but I never cared to investigate it, especially for an MS-DOS program... I don't think it is aimed at old systems, for instance it doesn't decompile 16-bit x86 (at least the free version).

DOSBox debug is definitely a gem: it made it possible for me to decipher some of the most intricate part of CIV's assembly - specially system and runtime code, such as overlay and driver loading. It also makes miracles when trying to understand the source of bugs.
But I never thought of using CheatEngine as well, that is quite smart actually :)

Thanks for the kudos, too, although I realized just a few days ago that the subroutine analyzed in this part is most likely strcat, statically linked from the standard C <string> library... My most recent guess is that CIV.EXE was compiled with Microsoft C compiler v5.1 (1988). There's actually a fair portion of the code which has nothing to do with CIV but is simply compiler-generated runtime code, or standard libraries.
In short: I'll focus on real game code in the next installments.
 
I have registered here so long ago that I used for registration e-mail which I abandoned long since. Therefore I haven't got received notification of your reply until I reloaded thread, hence delay in my reply.

I don't think it is aimed at old systems, for instance it doesn't decompile 16-bit x86 (at least the free version).

You puzzled me here. Isn't Civilization a 16-bit x86 application?

DOSBox debug is definitely a gem: it made it possible for me to decipher some of the most intricate part of CIV's assembly - specially system and runtime code, such as overlay and driver loading. It also makes miracles when trying to understand the source of bugs.

I must say 32-bit applications are much easier in this respect. These overlays can be really annoying. I made a cheat once which modified some code, but I used Cheat engine to apply those changes to the code. Then I met AI Player - "diplomacy" most likely uses overlays, because my cheat was disabled when I finished talking (I think original code was reloaded from .exe).

Thanks for the kudos, too, although I realized just a few days ago that the subroutine analyzed in this part is most likely strcat, statically linked from the standard C <string> library...
My most recent guess is that CIV.EXE was compiled with Microsoft C compiler v5.1 (1988). There's actually a fair portion of the code which has nothing to do with CIV but is simply compiler-generated runtime code, or standard libraries.

I don't remember, but did DOS had dynamically linked libraries at all? If not then there will be plenty of stuff like this...

In short: I'll focus on real game code in the next installments.

Could be very interesting! I was for example trying to find what was called in Civilization 2 a tech paradigm. I think I found it, but since I don't do so deep analyzing, I was not sure.
EDIT: I think Civ 1 and 2 were coded in some not very good fashion - at least when considering possibility of changing some parameters. I think there isn't any place in which one could set courthouse to clean 60% of corruption (or 90%). It is not a parameter which lays somewhere, rather it is put directly in code (shr corruption, 1 - something like this).

As for Master of Magic: sadly I never played it! I only remember the reviews at that time praising it as a "CIV with a fantasy twist"...

Well, it was great, but extremely buggy and also not very well designed - they for example miscalculated the fact that one could have a city which produced more than 255 gold (thx to "capitalization") - and they assigned for Gold only one byte, which caused it to "wrap". Civilization is much better in this respect - I once played on trade multiplied 10 times and apart for some graphical glitches it worked very well.
 
I have registered here so long ago that I used for registration e-mail which I abandoned long since. Therefore I haven't got received notification of your reply until I reloaded thread, hence delay in my reply.

No worries, activity on the Civ1 forums is rather scarce anyway... And for some reason I'm no longer receiving notification e-mails when threads I follow are updated... No sweat:)

You puzzled me here. Isn't Civilization a 16-bit x86 application?

Sorry I meant decompile as opposed to disassemble: IDA can generate template C source code from 32-bit x86 win32/Linux assembly, but not from 16-bit MS-DOS, which it can only disassemble.


I must say 32-bit applications are much easier in this respect. These overlays can be really annoying. I made a cheat once which modified some code, but I used Cheat engine to apply those changes to the code. Then I met AI Player - "diplomacy" most likely uses overlays, because my cheat was disabled when I finished talking (I think original code was reloaded from .exe).

So far I've only worked with segmented 16-bit so I can't tell the difference... But I guess it surely makes things easier to work with flat addresses...
In my obsession with CIV.EXE, I'm currently investigating the possibility to crunch all overlays back into a single monolithic EXE - the original CIV.EXE with overlays is around 320kb, far less than the 640kb limit anyway, so that should be possible I believe. I am also trying to get it to decompile (not just disassemble) to simplify the reverse engineering, but that's really tough work.



I don't remember, but did DOS had dynamically linked libraries at all? If not then there will be plenty of stuff like this...

It didn't - something I discovered very recently, after spending huge amounts of time browsing the runtime startup code, setting up interrupt handlers, PSP management, and other funny stuff... The real 'main()' starts later on, and from there on the assembly is much more straightforward to understand by comparison.



Could be very interesting! I was for example trying to find what was called in Civilization 2 a tech paradigm. I think I found it, but since I don't do so deep analyzing, I was not sure.
EDIT: I think Civ 1 and 2 were coded in some not very good fashion - at least when considering possibility of changing some parameters. I think there isn't any place in which one could set courthouse to clean 60% of corruption (or 90%). It is not a parameter which lays somewhere, rather it is put directly in code (shr corruption, 1 - something like this).

Not everything is designed to be parameterizable indeed, just like Wonders and buildings effects, which are hard-coded. But still there's a fair amount that's quite easy to customize: unit types, terrain properties, technology tree, most of the in-game text, to mention a few.

When browsing through disassembly, it is rather easy to see which parts were well thought out, and which ones were added/discarded more hastily... It's all the more fascinating that Sid Meier was the main programmer on it, it almost feels like being inside his brain - his brain from 1991, amyway.
 
In my obsession with CIV.EXE, I'm currently investigating the possibility to crunch all overlays back into a single monolithic EXE - the original CIV.EXE with overlays is around 320kb, far less than the 640kb limit anyway, so that should be possible I believe.


It works!!!
After limited debugging, I'm still surprised it was 'so simple'... now, ideally I need to squeeze in the drivers too, and maybe IDA will finally be able to disassemble it in one shot!
As a side effect this also opens a proper front for arbitrary code additions... Could be fun.
 
No worries, activity on the Civ1 forums is rather scarce anyway... And for some reason I'm no longer receiving notification e-mails when threads I follow are updated... No sweat:)

So it is settled, it doesn't work - I changed email to proper one and I still don't get notifications.

Sorry I meant decompile as opposed to disassemble: IDA can generate template C source code from 32-bit x86 win32/Linux assembly, but not from 16-bit MS-DOS, which it can only disassemble.

And that is what you said. I just misread you; I should read more carefully... ;)

So far I've only worked with segmented 16-bit so I can't tell the difference... But I guess it surely makes things easier to work with flat addresses...

Well, as I see it - something like Cheat Engine does when using AutoAssembly (namely it allocates new memory and jumps to it) is more difficult here in this model.

In my obsession with CIV.EXE, I'm currently investigating the possibility to crunch all overlays back into a single monolithic EXE - the original CIV.EXE with overlays is around 320kb, far less than the 640kb limit anyway, so that should be possible I believe. I am also trying to get it to decompile (not just disassemble) to simplify the reverse engineering, but that's really tough work.

But in 640 kB program has to fit also current game data - which is perhaps not that much (127 armies per player, 255 cities globally, map with all those layers) - are you sure it will fit?

It didn't - something I discovered very recently, after spending huge amounts of time browsing the runtime startup code, setting up interrupt handlers, PSP management, and other funny stuff... The real 'main()' starts later on, and from there on the assembly is much more straightforward to understand by comparison.

That is another thing that makes things easier in 32-bit applications - if program jumps to dll function you can read from documentation what it does.

Not everything is designed to be parameterizable indeed, just like Wonders and buildings effects, which are hard-coded. But still there's a fair amount that's quite easy to customize: unit types, terrain properties, technology tree, most of the in-game text, to mention a few.

Well, I may be wrong about bad coding. After all it may be simple compiler optimization. If you tell compiler to calculate 50% of some integer it is likely to use shr X,1 instruction.
I assume these parameters were done by preprocessor #define directives and not kept in structures, like parameters for armies and terrains seem to be.
 
But in 640 kB program has to fit also current game data - which is perhaps not that much (127 armies per player, 255 cities globally, map with all those layers) - are you sure it will fit?

You're right actually... But I just got it to work (in DOSBox, though) so I believe there is no memory problem here. The memory for units, cities, and such is all provided for at CIV.EXE loading time (you can check MZ header for "minimum paragraphs to allocate in addition to code size", and also "initial SS and SP" which should point at the last byte in memory upon load).
Now, in addition to its own code, CIV also loads drivers: MISC.EXE (180 bytes for keyboard and mouse), MGRAPHIC.EXE (7kb, MCGA graphics) and ASOUND.CVL (38kb, AdLib sound). EGRAPHIC.EXE is bigger than MGRAPHIC.EXE for some reason. But all in all, we're still within the bounds: 320kb + 7kb + 38kb = 365kb.
Now, CIV is also allocating graphics memory, at least 3 full 320x200 buffers, with 1 byte per pixel, so that makes 3 x 64000 = 187,5kb (not counting headers).
So we're up to 364kb + 187,5kb = 551,5kb... Still within the bounds.



Well, I may be wrong about bad coding. After all it may be simple compiler optimization. If you tell compiler to calculate 50% of some integer it is likely to use shr X,1 instruction.
I assume these parameters were done by preprocessor #define directives and not kept in structures, like parameters for armies and terrains seem to be.

There are definitely ugly things, coming from macros (#define): abuse of "abs()" when dealing with guaranteed positive values, last minute hard-wired spaceship structure and code (scrapping of the intended spaceship functionality was documented in the release notes actually), some terrain improvements tasks have partially hard-coded turn counts that also feel like last minute programmer adjustments, etc.
But to me that's what makes the reverse-engineering interesting: it really shows that the process of a programming a game like CIV was spread over time, with some polished and carefully designed areas, some aborted attempts, some botched patches here and there, etc. It's a richness that reflects how the actual people were busy soldering this thing...
 
Now, CIV is also allocating graphics memory, at least 3 full 320x200 buffers, with 1 byte per pixel, so that makes 3 x 64000 = 187,5kb (not counting headers).
So we're up to 364kb + 187,5kb = 551,5kb... Still within the bounds.

This comment about 3 buffers reminded me of something. Is there a way to observe your own units in a way similar to enemy units when having option not to observe enemy moves? Or for example just like when one presses Home/End/PageUp/PageDown on South/North Pole - it gives pretty fast movements. To put it simply, is there a way to disable movement animations?

But to me that's what makes the reverse-engineering interesting: it really shows that the process of a programming a game like CIV was spread over time, with some polished and carefully designed areas, some aborted attempts, some botched patches here and there, etc. It's a richness that reflects how the actual people were busy soldering this thing...

Of course I am only debugging things not doing thorough analysis, but I feel code of Civilization is quite clean. I tried to debug Genghis Khan 2 and it was kinda tough compared to Civ1.

PS. Is there a way to search in IDA for Hex representation of instructions and then switch to mnemonic ASM/graph representation? I was looking for a way to visualize what I am debugging and I cannot get IDA to find anything (hex-wise).
 
This comment about 3 buffers reminded me of something. Is there a way to observe your own units in a way similar to enemy units when having option not to observe enemy moves? Or for example just like when one presses Home/End/PageUp/PageDown on South/North Pole - it gives pretty fast movements. To put it simply, is there a way to disable movement animations?

Ha! This definitely sounds like a hacking quest :) My guess is that it is feasible, but of course it needs investigation.



PS. Is there a way to search in IDA for Hex representation of instructions and then switch to mnemonic ASM/graph representation? I was looking for a way to visualize what I am debugging and I cannot get IDA to find anything (hex-wise).

Ok, so from where I stand, there could be 4 reasons, or most likely a combination of those 4 reasons, why you cannot find what you want:
  1. You're using little-endian in IDA (which is wrong): when using hex search in IDA, you should always enter values as big endian, because IDA takes the initiative to convert it to little endian for you... For example, if you want to look for the bytes "CD 3F", which code for "int 3Fh" (the overlay manager interrupt), you should input those in IDA hex search as "3FCD". Otherwise, if you input literrally as "CD 3F", it will typically find "mov ax, 3Fh" instructions, which are coded as "B8 3F CD" - automatically inverted
  2. Your lookup bytes includes a relocated value: not sure how familiar you are with 16-bit segment addressing and MZ EXE format, but you should know that it has a features called relocations, that makes it possible for MS-DOS to load the EXE anywhere in memory, and still use absolute addressing without breaking the code:
    • Basically, 16-bit segmented code uses segments and offsets to address memory and other parts of the code: offsets are always relative to a segment, so they are insensitive to the place where the EXE is loaded in memory; but segments are absolute: their value correspond to absolute positions in memory, at runtime.
    • To make this possible all-the-while being able to load the EXE anywhere in memory, the MZ-EXE format contains a relocation table, which is a list of a all positions in the code whose value is a segment. In the EXE file itself, all segment values are stored as if the EXE was loaded at absolute address "0" in memory. For example, in CIV 474.01 (unpacked), the default value of dseg is "2A1B" and seg000 is at "0". Then, when MS-DOS load the EXE in memory in order to run it, it allocates memory where it can, writes down the allocation position (for example at segment "01A2h"), then loads the program image in memory, and adds the program position to every value recorded in the relocation table. So every instance of "2A1B" in the code that is in the relocation table, get update to "2BBD", which is 2A1B + 1A2.
    • If your lookup bytes contain a relocated value, as read from DOSBox debug, you need to update them to match the program position 0 in IDA; alternatively, you may be able to completely rebase your IDA database to match the base allocated by DOSBox; in my experience, DOSBox always loads CIV_UNP.EXE at segment 1A2h
  3. Your lookup code is in an overlay: even if you managed relocations and endianness, you may still not find your lookup sequence if it is part of an overlay. Indeed, overlays are pieces of code, stored as completely separate MZ executables, but awkwardly located within CIV.EXE itself - just make a seach for the 'MZ' magic bytes in CIV.EXE with a hex editor, and you'll see what I mean:
    • Meant to reduce the memory footprint of the program (at the expense of loading code at runtime), those pieces of code are dynamically loaded, at runtime, when necessary, using an overlay interrupt (int 3Fh)
    • In the main CIV.EXE program, a segment is dedicated to loading overlays (seg020), and everytime a new overlay is loaded, it completely overwrites the segment contents, discarding the overlay that was loaded before.
    • It so happens that a good portion of CIV code lies within overlays (map generation, save/load, most reports, some diplomacy, some AI, etc.)
    • IDA v5.0 (and presumably earlier versions too) does not recognize overlays, and as such they are not present in your IDA database by default. There is a way to have them all in IDA, and transform overlay interrupts into usual far calls, but I'll keep the details for the next tutorial installment ;)
    • So if the lookup bytes you are looking for actually come from an overlay, you won't find them in IDA; just check whether your code is in seg020 in DOSBox (using the relocations arithmetic from the previous point; in my experience, seg020 is at 252Ah in DOSBox)
  4. Last but not least, your lookup code is in a driver: by driver I mean one the files MISC.EXE, *GRAPHIC.EXE or *SOUND.CVL. All those files are standalone MZ EXE files which are also loaded dynamically, at runtime, by CIV.EXE, upon choosing the preferred Video and Audio settings:
    • Contrary, to overlays, CIV loads drivers after the main code, within their own allocated memory, and then makes some intricate self-code patching to link CIV.EXE code with the drivers code - see all those "jmp far ptr 0:0" after seg000:75Fh? Those are getting patched with far jumps to actual driver code, after the driver is loaded
    • Here again, IDA is definitely not going to help you unless you already managed to get the drivers in IDA and link them with CIV manually
    • So, again, if your lookup bytes are in the driver code, forget about IDA - until the next tutorial installment anyway. To make sure whether that is the case, check the segment where your code lies in DOSBox (debug), and verify whether it is within or beyond the range of code disassembled by IDA, still by using relocation arithmetic from point 2, or alternatively, by checking whether CS is above SS (indeed SS contains the Stack Segment, which is always the last segment of a program, while CS is the code segment, where your currently running code is located... if CS is above SS, then you're located in code which is beyond the stack, i.e. which was loaded after the program started, i.e. driver code)

If you still can't find your code, please give more specific details, if you want me to help you.
 
Top Bottom