darkpanda
Dark Prince
- Joined
- Oct 28, 2007
- Messages
- 844
This thread is PART 1 of a tutorial about reverse-engineering CIV.EXE with the help of IDA. You can directly access other parts using the links below:
Note that this tutorial is split between different posts/threads because of the limitation of this forum's 30,000 characters per post limitation
I've been thinking about writing a tutorial for CIV reverse-engineering through IDA for a long time already, but in addition to time issues, I was questioning the utility of it anyway...
Finally, I've decided to give it a go, because:
So here it goes!
PRE-REQUISITES
What you need to follow this tutorial:
The above list of knowledge requirements is purely indicative, simply because I don't intend to re-explain each of the concepts above, and will assume the reader can grasp them on his/her own.
In my opinion, a very good starting point to dive into assembly, all-the-while being extremely detailed and comprehensive, is the freely available "The Art of Assembly Language Programming" by Randall Hyde. The MS-DOS/16-bit x86 version is available online here (original) or there (more readable, on phatcode.net), or also for download if you look around.
It doesn't limit itself to assembly, but also covers boolean algebra, cpu architecture, and many other things related to low-level mechanics of a computer.
With the above, you should be well-equipped to begin, so lets' go!
0.1. Install IDA
First things first: install IDA v5 on your computer, if not already done.
If you're running Windows 7 64bit (which is my case), it will be installed by default in directory "C:\Program Files (x86)\IDA Free".
0.2. Unpack CIV.EXE
Gowron already talked about the necessity to unpack CIV.EXE in this thread: Modding Civilization I - Data Tables, so I will not go over this again here.
In order to unpack your CIV.EXE, proceed as follows:
The result is that you now have a new file, CIV_UNP.EXE, which is the unpacked version of CIV.EXE.
Both files can be executed to play CIV, although CIV_UNP.EXE will only work if you keep CIV.EXE in nearby, in the same directory (a matter of overlays, discussed here: http://forums.civfanatics.com/showthread.php?p=12284742#post12284742)
That's it for setup, we're good to go!
First step: start IDA, which will prompt you with the following "splash" dialog, in which you should click "OK":
The next dialog is a kind of "quickstart" dialog, which lets you choose to open a recent "disassembly database" or create new one. For this tutorial click on "New":
When creating a new "disassembly database" (this is the name of files in which IDA stores all information about a disassembled file), you need to tell IDA what is the type of the file to disassemble. In our case, CIV_UNP.EXE (as well as CIV.EXE) is a standard MS-DOS executable (with .EXE extension), otherwise known as a DOS "MZ" executable, due to the fact that the 2 first bytes of such files are "M" (0x4D) and "Z" (0x5A), being the initials of one of the MS-DOS developers (more info about this here: http://en.wikipedia.org/wiki/DOS_MZ_executable).
So click on the "DOS" tab (2nd tab) in the database type selection dialog, and select the first item, named "MZ/LE/DJGPP-COFF/Watcom-W32/RUN Executable", then click "OK":
IDA immediatly prompts you to select the file to disassemble, so this is where you go and select "CIV_UNP.EXE" created in the previous step (STEP 0), then click "Open":
IDA then opens a dialog that it calls the "MZ/LE/DJGPP-COFF/Watcom-W32/RUN Executable file loading Wizard". Its first panel contains only a checkbox "Analysis options" that we will leave blank for the moment, and just click "Next":
Next up, the wizard is finished (what a wizard!), and the final panel contains a single checkbox "Start analysis now" that we will leave checked, then click "Finish":
Right after that, IDA prompts you with a warning: when processing CIV_UNP.EXE, it could only find a certain amount of code (0x30C60 bytes), which is significantly smaller than the total size of the file (0x4FA21 bytes, and asks you whether it should continue processing despite this unexpected finding... We will see later on that this "extra" information are actually a "wonderful" thing called overlays, but for the moment let's just click "Yes":
Finally IDA starts analyzing CIV_UNP.EXE and this is where the magic begins:
2.1. Quick IDA layout overview
The first thing you will notice is the main window's content, listing disassembled instructions, in various shades of blue and orange;
While you start staring at it in disbelief, you will also notice the following:
Initially, IDA positions the main code window at a place labelled "start" which correspond to the starting point of the program. Technically, this would more or less correspond to the "main()" function of a C/C++ or Java program. For CIV_UNP.EXE v474.01, the address of the start entry point is "seg019:2E83":
Let's take a moment here to lay an essential foundation for the rest of the tutorial: code addresses, or offsets and segments.
If you already know what there is to know about segments and offsets, just skip this section entirely.
Earlier I stated that "seg019:2E83" is an address, which basically means "the position of a byte". But how exactly does "seg019:2E83" represents the position of a byte in the file?
The short answer is that "seg019:2E83" means the position, or offset, 0x2E83 (in decimal notation, the 11907th byte) relative to the segment named 'seg019'".
So if it's only a relative position, how do I get to find the exact, or absolute, offset of this byte from the beginning of the file?
Well the obvious answer is that the absolute position of "seg019:2E83" corresponds to "position of seg019 + the offset 0x2E83" (not exactly true, but it doesn't matter here - I'm thinking about the EXE header, for those wondering...)
Ok, fine, but what is the "position of seg019" then?
The quick answer is given by pressing CTRL+S within IDA, which pops up the "Segments list dialog" as shown below: it contains all information relative to those things called 'segments', including their segment "base":
"Ha!", you might think, "so this 'Base', which is 0x2058 for seg019, corresponds to the 'position of seg019', then, right?"
Well not exactly: the actual 'position of seg019' corresponds to 'Base * 16', as in 'Base multiplied by 16'. So in hexadecimal notation, this means that 'position of seg019' is: 'Base' * 0x10 = 0x2058 * 0x10 = 0x20580.
So finally, the absolute position corresponding to "seg019:2E83" is 0x20580 + 0x2E83 = 0x23403.
Interesting, isn't it? But then, you may start to wonder, why the FLICK do we need to care about obscure things like "seg019:2E83" addresses, when we could as well directly have plain and simple "0x23403"?
The short answer to that is: because we are in a 16-bit context.
The long answer is that: because we are in a 16-bit context, the biggest single data element that the CPU can manipulate has 16 bits, or 2 bytes. Thus, the biggest value that can be expressed using a single data element is 0xFFFF = 65535. Obviously, if your program contains less than 65535 bytes, then all byte offsets (positions) in your program can be identified in an absolute way by using only 16-bit values, and thus the program can make cross-references between code and data using 16-bit value only.
But what if the program is bigger than 65535 bytes? The only way is for cross-references to use 2 data elements of length 16-bit, combining them in some way, in order to represent the absolute bigger-than-65535 addresses.
This "some way" was established by MS-DOS developers as using two 16-bit elements called the "segment base" and the "offset". Those 2 elements are combined as explained above, that is: [segbase : offset] = [SSSS : TTTT] = 0xSSSS * 16 + 0xTTTT.
Note, this only allows to use as much as 0xFFFF*16 + 0xFFFF = 0x10FFEF addresses (around 1 Mb) of program size, which became another problem when program sizes reached the 1 Mb limit, too, but that is another story...
2.3. Glancing at segments
By pressing "CTRL+S" in IDA, which pops up the "Segments list dialog" as shown below, we can see all information relative to the segments:
Normally, IDA automatically detects 33 segments in CIV.EXE, and gives them unique Names from "seg000" to "seg032". You will notice that 1 segment in particular has a different name from all others: it is named dseg instead of seg031.
You will also see that this segment differs from all other by its Class which is "DATA". Forgetting about the last segment, whose class is STACK, all other segments' classes are either CODE, or UNK (which stands for 'unknown' meaning IDA doesn't know if it should be CODE or DATA).
2.4. The first few strings
So IDA marked this 32nd segment "dseg": let's have look at it by double-cliking dseg in the Segment list, or clicking "OK" when it is selected. The main window scrolls down to the beginning of dseg:
What do we have here?
You see that the initial bytes of dseg have not been interpreted, there are still raw bytes in a vertical sequence. But IDA gives you a hint by showing the ASCII character that corresponds to each byte, and there's clearly a readable sequence of characters here: M,S, , R,u,n,-,t,i,m,e,...
We are going to tell IDA that those bytes should be considered as a string:
Indeed, although IDA does its best to automatically guess what is data and what is code, its guessing power is somewhat limited, and oftentimes wrong. Many bytes in the program are just left untouched by IDA, too. At best, the auto-analysis made by IDA is a good starting, but now there is a lot of work ahead of us to analyze what IDA missed, clean-up IDA's mistakes and further perform manual code and data discrimination.
If you're like me, you already scrolled further down in dseg, and you have probably encountered other strings, this time auto-detected by IDA:
This is much more motivating, actual strings from the game! Don't you feel like you're touching something here? I know I did...
2.5. Cross-references a.k.a xrefs
2.5.1. Preamble
Before starting to talk about cross-references, let's find another very well-known string, by searching it through IDA.
To do so, click the "search text" button in the top toolbar (or press Alt+T), enter "break":
Then press "Enter" or click "OK": this will scroll the window down to another well-known string, already detected by IDA and named 'aCancelAction_B':
You'll notice this string is spreading over 3 lines: this is because IDA chooses to break a line as soon as it encounters a byte that is deemed not displayable as ASCII - even though the string does contain all characters on 3 lines. So the string here is:
The 0Ah bytes correspond the better known "\n", which is the new-line character in C. In addition, you can see that hte last byte is 0, as explained earlier.
If you're a CIV player, you know that this string is used when the human player is attacking another Civ with whom he has a peace treaty: the user is asked to confirm whether the treaty should be broken, and this is string is the list of possible actions. Actually, if you look above and below this string, you will see other strings used in the same context:
That's exciting, but what would be more exciting is to actually find the code that uses those strings... How can we do that?
Let's see: the address of aCancelAction_B is dseg:2ADA. What we are going to do is search in the whole file if we can find any mention of this string.
By "mention of this string" I actually a mean a reference to this string. This is exactly the same concept as a reference in C language, if you've heard it before.
Any reference to this string will use this string's address, in one form of another. Now, we'll just try to find the offset part of the addres, that is "0x2ADA".
To do that, click the "search for sequence of bytes" button (or press Alt+B), enter "2ada", and check the box "Find all occurences":
Then click "Ok" or press "Enter". IDA only finds 1 result, which is quite appropriate:
As you see, the value was found as an operand of an assembly instruction ("mov ax, 2ADAh") at address "seg011:0CDA".
Let's jump to this position by double-clikcing the entry in the result list, and IDA scrolls up to the corresponding location:
It is possible that instead of the plain text listing of assembly, IDA directly shows you the "diagram/graph" display, as below:
You can switch between text view and diagram/graph view by simply pressing "Space".
2.5.2. Our first cross-reference
So we have found a data string that we know from playing the game, and we have found a single place in the code (seg011:0CDA) which contains this string's offset.
Let's jump the fence and assume that this instruction's operand (2ADAh) is actually the string's offset. We want IDA to take this assumption for a fact, and we will do so by creating a cross-reference between the code instruction's operand, and the actual string.
To do this, click on the operand 2ADAh, and then press Alt+R: this will open the "Choose segment as base" window, in which we will select the string's segment, which is dseg:
Then, click "OK" or press Enter: IDA modifies the code to explicitly use the string's name in the code, instead of its raw offset:
You can double-click the string's name, which will make you jump to the string itself in the data segment: over there, you'll see a new mark, on the right of the strings, that tells you this strings is referenced from somewhere else, this is the cross-reference:
If you hover the mouse on the cross-reference, IDA will give you a quick tooltip overview of the code at the referencing location, too.
2.6. Most essential IDA shortcuts
Hereunder is a handful of very useful shortcuts that will ease your way through IDA browsing:
This concludes a very high-level introduction to reverse-engineering CIVDOS with IDA v5, which further continues here: Tutorial: CIV+IDA - Part 2
Cheers!
- PART 1
- PART 2
Note that this tutorial is split between different posts/threads because of the limitation of this forum's 30,000 characters per post limitation
Foreword
I've been thinking about writing a tutorial for CIV reverse-engineering through IDA for a long time already, but in addition to time issues, I was questioning the utility of it anyway...
Finally, I've decided to give it a go, because:
- even a small tutorial can help a lot: indeed I liked a lot how Gowron (kudos to him as always) gave me a small but determining kickstart when I first opened IDA; it definitely led me to the proficiency I have today in general assembly and EXE analysis, and CIV R/E in particular
- out there, there are people (weevil?) who try to undertake the same arduous task of reverse-engineering CIV from scratch, and I don't see the point of letting them take the hard road when I could provide them with that same kickstart... In the end, the more eyes on CIV, the better, right? (I don't know about that actually )
- a tutorial liks this can be easily extrapolated to other games or programs beyond CIV, be they 16-bit or 32/64 bits, for other OS's, or for other uses of IDA; I think I would have been very happy to see a guide like this to guide me through disassembling a piece of software, or quick hands-on with assembly
- finally, just because I feel the urge to share my knowledge... Sorry... Can't help it... You don't have to read it anyway
So here it goes!
PRE-REQUISITES
What you need to follow this tutorial:
- Softwares:
- a copy of Sid Meier's Civilization for MS-DOS, in particular of the CIV.EXE file (preferrably version 474.01, which is the one used in this tutorial)
- an ExePack unpacker, such as unp.exe, available here: http://www.filewatcher.com/m/unp.exe.19813.0.0.html; note: the next version of JCivED will provide this feature as well
- a MS-DOS running environment, such as: DOSBox v0.74, or an MS-DOS Virtual Machine (see tutorials to create one here or there)
- the last free version of the Interactive Disassembler, IDA v5.0, download link: http://out7.hex-rays.com/files/idafree50.exe
- optionally: an hexadecimal editor, such as: HxD (recommended by Gowron), xvi32 (recommended by Dack), Hex Editor Neo, or UltraEdit (my personnal favorite but not free, only free trial)
- a copy of Sid Meier's Civilization for MS-DOS, in particular of the CIV.EXE file (preferrably version 474.01, which is the one used in this tutorial)
- Basic knowledge in Computer Science, such as:
- boolean algebra
- data types and structures (bits, bytes, chars, integers, strings, arrays), as well as signed/unsigned logic
- hexadecimal notation of numerical values
- language constructs (if, for, while...) and binary operations (or, and, not, left/right shift, ...)
- rough ideas about computer architecutre: cpu, memory, operating system, input devices (keyboard...), output devices (screen...)
- etc.
- boolean algebra
- Some an interest or desire to learn x86 assembly with a side of 16-bit MS-DOS
The above list of knowledge requirements is purely indicative, simply because I don't intend to re-explain each of the concepts above, and will assume the reader can grasp them on his/her own.
In my opinion, a very good starting point to dive into assembly, all-the-while being extremely detailed and comprehensive, is the freely available "The Art of Assembly Language Programming" by Randall Hyde. The MS-DOS/16-bit x86 version is available online here (original) or there (more readable, on phatcode.net), or also for download if you look around.
It doesn't limit itself to assembly, but also covers boolean algebra, cpu architecture, and many other things related to low-level mechanics of a computer.
With the above, you should be well-equipped to begin, so lets' go!
STEP 0: Setup your environment
0.1. Install IDA
First things first: install IDA v5 on your computer, if not already done.
If you're running Windows 7 64bit (which is my case), it will be installed by default in directory "C:\Program Files (x86)\IDA Free".
0.2. Unpack CIV.EXE
Gowron already talked about the necessity to unpack CIV.EXE in this thread: Modding Civilization I - Data Tables, so I will not go over this again here.
In order to unpack your CIV.EXE, proceed as follows:
- copy unp.exe to the same directory where CIV.EXE is located
- run your MS-DOS environment and go that same directory with the DOS shell/command line
- run the following command: UNP CIV.EXE CIV_UNP.EXE <- do not forget the "CIV_UNP.EXE", otherwise this will overwrite your existing CIV.EXE !!!
- here is the expected output when running the command:
The result is that you now have a new file, CIV_UNP.EXE, which is the unpacked version of CIV.EXE.
Both files can be executed to play CIV, although CIV_UNP.EXE will only work if you keep CIV.EXE in nearby, in the same directory (a matter of overlays, discussed here: http://forums.civfanatics.com/showthread.php?p=12284742#post12284742)
That's it for setup, we're good to go!
STEP 1: INITIAL DISASSEMBLY
First step: start IDA, which will prompt you with the following "splash" dialog, in which you should click "OK":

The next dialog is a kind of "quickstart" dialog, which lets you choose to open a recent "disassembly database" or create new one. For this tutorial click on "New":

When creating a new "disassembly database" (this is the name of files in which IDA stores all information about a disassembled file), you need to tell IDA what is the type of the file to disassemble. In our case, CIV_UNP.EXE (as well as CIV.EXE) is a standard MS-DOS executable (with .EXE extension), otherwise known as a DOS "MZ" executable, due to the fact that the 2 first bytes of such files are "M" (0x4D) and "Z" (0x5A), being the initials of one of the MS-DOS developers (more info about this here: http://en.wikipedia.org/wiki/DOS_MZ_executable).
So click on the "DOS" tab (2nd tab) in the database type selection dialog, and select the first item, named "MZ/LE/DJGPP-COFF/Watcom-W32/RUN Executable", then click "OK":

IDA immediatly prompts you to select the file to disassemble, so this is where you go and select "CIV_UNP.EXE" created in the previous step (STEP 0), then click "Open":

IDA then opens a dialog that it calls the "MZ/LE/DJGPP-COFF/Watcom-W32/RUN Executable file loading Wizard". Its first panel contains only a checkbox "Analysis options" that we will leave blank for the moment, and just click "Next":

Next up, the wizard is finished (what a wizard!), and the final panel contains a single checkbox "Start analysis now" that we will leave checked, then click "Finish":

Right after that, IDA prompts you with a warning: when processing CIV_UNP.EXE, it could only find a certain amount of code (0x30C60 bytes), which is significantly smaller than the total size of the file (0x4FA21 bytes, and asks you whether it should continue processing despite this unexpected finding... We will see later on that this "extra" information are actually a "wonderful" thing called overlays, but for the moment let's just click "Yes":

Finally IDA starts analyzing CIV_UNP.EXE and this is where the magic begins:

STEP 2: FIRST ENCOUNTER WITH IDA
2.1. Quick IDA layout overview
The first thing you will notice is the main window's content, listing disassembled instructions, in various shades of blue and orange;

- at the top of the screen, a black/red/yellow/blue bar is being modified in real time, with a red cursor progressing from left to right; this is a high-level "map" of the file, and the progressing cursor is a sign that IDA is continuing to analyze the code, and trying to infer plenty of implicit code logic for you, although you can already start to browse it as you like:
- at the left of the main window, you will see line prefixes in the form "seg019:xxxx" and onwards: those are basically the code addresses of the disassembled code; we will come back to this later on
- at the bottom of the screen, a window with cryptic log-like text is talking about compiling, executing, and analyzing stuff... this is the IDA console, which gives you information about actions that are currently in progress; we'll come back to that later as well
- at the top of the screen, a black/red/yellow/blue bar is being modified in real time, with a red cursor progressing from left to right; this is a high-level "map" of the file, and the progressing cursor is a sign that IDA is continuing to analyze the code, and trying to infer plenty of implicit code logic for you, although you can already start to browse it as you like:
Initially, IDA positions the main code window at a place labelled "start" which correspond to the starting point of the program. Technically, this would more or less correspond to the "main()" function of a C/C++ or Java program. For CIV_UNP.EXE v474.01, the address of the start entry point is "seg019:2E83":
Let's take a moment here to lay an essential foundation for the rest of the tutorial: code addresses, or offsets and segments.
If you already know what there is to know about segments and offsets, just skip this section entirely.
Earlier I stated that "seg019:2E83" is an address, which basically means "the position of a byte". But how exactly does "seg019:2E83" represents the position of a byte in the file?
The short answer is that "seg019:2E83" means the position, or offset, 0x2E83 (in decimal notation, the 11907th byte) relative to the segment named 'seg019'".
So if it's only a relative position, how do I get to find the exact, or absolute, offset of this byte from the beginning of the file?
Well the obvious answer is that the absolute position of "seg019:2E83" corresponds to "position of seg019 + the offset 0x2E83" (not exactly true, but it doesn't matter here - I'm thinking about the EXE header, for those wondering...)
Ok, fine, but what is the "position of seg019" then?
The quick answer is given by pressing CTRL+S within IDA, which pops up the "Segments list dialog" as shown below: it contains all information relative to those things called 'segments', including their segment "base":

Well not exactly: the actual 'position of seg019' corresponds to 'Base * 16', as in 'Base multiplied by 16'. So in hexadecimal notation, this means that 'position of seg019' is: 'Base' * 0x10 = 0x2058 * 0x10 = 0x20580.
So finally, the absolute position corresponding to "seg019:2E83" is 0x20580 + 0x2E83 = 0x23403.
Interesting, isn't it? But then, you may start to wonder, why the FLICK do we need to care about obscure things like "seg019:2E83" addresses, when we could as well directly have plain and simple "0x23403"?
The short answer to that is: because we are in a 16-bit context.
The long answer is that: because we are in a 16-bit context, the biggest single data element that the CPU can manipulate has 16 bits, or 2 bytes. Thus, the biggest value that can be expressed using a single data element is 0xFFFF = 65535. Obviously, if your program contains less than 65535 bytes, then all byte offsets (positions) in your program can be identified in an absolute way by using only 16-bit values, and thus the program can make cross-references between code and data using 16-bit value only.
But what if the program is bigger than 65535 bytes? The only way is for cross-references to use 2 data elements of length 16-bit, combining them in some way, in order to represent the absolute bigger-than-65535 addresses.
This "some way" was established by MS-DOS developers as using two 16-bit elements called the "segment base" and the "offset". Those 2 elements are combined as explained above, that is: [segbase : offset] = [SSSS : TTTT] = 0xSSSS * 16 + 0xTTTT.
Note, this only allows to use as much as 0xFFFF*16 + 0xFFFF = 0x10FFEF addresses (around 1 Mb) of program size, which became another problem when program sizes reached the 1 Mb limit, too, but that is another story...
2.3. Glancing at segments
By pressing "CTRL+S" in IDA, which pops up the "Segments list dialog" as shown below, we can see all information relative to the segments:

Normally, IDA automatically detects 33 segments in CIV.EXE, and gives them unique Names from "seg000" to "seg032". You will notice that 1 segment in particular has a different name from all others: it is named dseg instead of seg031.
You will also see that this segment differs from all other by its Class which is "DATA". Forgetting about the last segment, whose class is STACK, all other segments' classes are either CODE, or UNK (which stands for 'unknown' meaning IDA doesn't know if it should be CODE or DATA).
2.4. The first few strings
So IDA marked this 32nd segment "dseg": let's have look at it by double-cliking dseg in the Segment list, or clicking "OK" when it is selected. The main window scrolls down to the beginning of dseg:

What do we have here?
You see that the initial bytes of dseg have not been interpreted, there are still raw bytes in a vertical sequence. But IDA gives you a hint by showing the ASCII character that corresponds to each byte, and there's clearly a readable sequence of characters here: M,S, , R,u,n,-,t,i,m,e,...
We are going to tell IDA that those bytes should be considered as a string:
- With the mouse, click on the first byte:
- Then press the 'A' key, which assigns the type 'string' to a byte; IDA will ask you to confirm that you want to make a string here; click OK:
- And a little bit of magic happens! IDA has gone through the bytes from the first one, interpreting all of them as characters up to the C-style string-ending character '00h'; also note that the byte now has a Name, "aMsRunTimeLibra", which is created from the "a" prefix and the first few characters of the string:
- With the mouse, click on the first byte:
Indeed, although IDA does its best to automatically guess what is data and what is code, its guessing power is somewhat limited, and oftentimes wrong. Many bytes in the program are just left untouched by IDA, too. At best, the auto-analysis made by IDA is a good starting, but now there is a lot of work ahead of us to analyze what IDA missed, clean-up IDA's mistakes and further perform manual code and data discrimination.
If you're like me, you already scrolled further down in dseg, and you have probably encountered other strings, this time auto-detected by IDA:

This is much more motivating, actual strings from the game! Don't you feel like you're touching something here? I know I did...
2.5. Cross-references a.k.a xrefs
2.5.1. Preamble
Before starting to talk about cross-references, let's find another very well-known string, by searching it through IDA.
To do so, click the "search text" button in the top toolbar (or press Alt+T), enter "break":

Then press "Enter" or click "OK": this will scroll the window down to another well-known string, already detected by IDA and named 'aCancelAction_B':

You'll notice this string is spreading over 3 lines: this is because IDA chooses to break a line as soon as it encounters a byte that is deemed not displayable as ASCII - even though the string does contain all characters on 3 lines. So the string here is:
Code:
"!
Cancel action.
Break treaty."
The 0Ah bytes correspond the better known "\n", which is the new-line character in C. In addition, you can see that hte last byte is 0, as explained earlier.
If you're a CIV player, you know that this string is used when the human player is attacking another Civ with whom he has a peace treaty: the user is asked to confirm whether the treaty should be broken, and this is string is the list of possible actions. Actually, if you look above and below this string, you will see other strings used in the same context:

That's exciting, but what would be more exciting is to actually find the code that uses those strings... How can we do that?
Let's see: the address of aCancelAction_B is dseg:2ADA. What we are going to do is search in the whole file if we can find any mention of this string.
By "mention of this string" I actually a mean a reference to this string. This is exactly the same concept as a reference in C language, if you've heard it before.
Any reference to this string will use this string's address, in one form of another. Now, we'll just try to find the offset part of the addres, that is "0x2ADA".
To do that, click the "search for sequence of bytes" button (or press Alt+B), enter "2ada", and check the box "Find all occurences":

Then click "Ok" or press "Enter". IDA only finds 1 result, which is quite appropriate:

As you see, the value was found as an operand of an assembly instruction ("mov ax, 2ADAh") at address "seg011:0CDA".
Let's jump to this position by double-clikcing the entry in the result list, and IDA scrolls up to the corresponding location:

It is possible that instead of the plain text listing of assembly, IDA directly shows you the "diagram/graph" display, as below:

You can switch between text view and diagram/graph view by simply pressing "Space".
2.5.2. Our first cross-reference
So we have found a data string that we know from playing the game, and we have found a single place in the code (seg011:0CDA) which contains this string's offset.
Let's jump the fence and assume that this instruction's operand (2ADAh) is actually the string's offset. We want IDA to take this assumption for a fact, and we will do so by creating a cross-reference between the code instruction's operand, and the actual string.
To do this, click on the operand 2ADAh, and then press Alt+R: this will open the "Choose segment as base" window, in which we will select the string's segment, which is dseg:

Then, click "OK" or press Enter: IDA modifies the code to explicitly use the string's name in the code, instead of its raw offset:

You can double-click the string's name, which will make you jump to the string itself in the data segment: over there, you'll see a new mark, on the right of the strings, that tells you this strings is referenced from somewhere else, this is the cross-reference:

If you hover the mouse on the cross-reference, IDA will give you a quick tooltip overview of the code at the referencing location, too.
2.6. Most essential IDA shortcuts
Hereunder is a handful of very useful shortcuts that will ease your way through IDA browsing:
- Navigation
- You can double-click or press Enter on cross-references, this will jump you to the referenced location
- Conversely, if you just want to get back to your previous location, press ESC: this one saved me a huge amount of time...
- Finally, you can also directly jump to any location by pressing 'G', and enter an address in the format 'segXYZ:ABCD'
- You can double-click or press Enter on cross-references, this will jump you to the referenced location
- Cross-referencing
- Just a reminder from above, in order to force the referencing of an arbitrary value or operand relative to a segment, press Alt+R when the value is highlighted
- If you belive you chose a wrong segment, or the value just isn't a memory reference, then just press ALT+R again to remove the segment base reference, when the value is highlighted
- You may also list all the cross-references to a location by pressing CTRL+X when the referenced item is highlighted
- Just a reminder from above, in order to force the referencing of an arbitrary value or operand relative to a segment, press Alt+R when the value is highlighted
- One last, but definitely useful item in this list: by pressing 'N' you can rename any location in IDA, such as subroutines, memory addresses, etc.
- Navigation
This concludes a very high-level introduction to reverse-engineering CIVDOS with IDA v5, which further continues here: Tutorial: CIV+IDA - Part 2
Cheers!
Last edited: