Lesson 1 – Introduction to Disassembly
How to use W32DASM and the WinAPI Programmer’s Reference
=========================================
Required Tools:
| Code: |
Hex Editor: Hex Worshop Disassembler: W32Dasm Documentation Journal: Win32API Download Test File: http://www.mediafire.com/?cdjpuixrufe |
***************************************
Introduction
————————————————————————–
In this first lesson, we’ll be looking at a simple program (VERY simple) and its disassembled source code. It’s virtually impossible to do anything with an .EXE file without mucking about in its source code. We don’t have the luxury of Open Source here; if we did, we wouldn’t need to learn reversing now would we? So, we need to use a disassembler to generate assembly language source code from our .EXE file.
At first, assembly language might seem a bit difficult. The raw source code that we’ll be examining doesn’t have variables, doesn’t have objects, and doesn’t really resemble any human language. However, it does have its own syntax. Additionally, assembly language (unlike most other languages) doesn’t have tons of synonyms.
There aren’t very many different ways to say the same thing. This is good because you’ll notice certain patterns that repeat themselves again and again. They’ll start to become beacons of light in the spooky, coded darkness.
Stick with it. Once you see some examples, you’ll realize that assembly isn’t that hard to begin to decipher. Also, keep in mind that you usually won’t need to examine every line of code in a program. A big piece of what you’re going to be learning is how to focus in on specific parts of a program to get your work done. So what are we waiting for? Let’s go.
Our First Disassembly
—————————————————————————–
Ok. Here we are. Ready to do a little work?
Good. First thing I want you to do is make sure you have the file we’ll be
working with this lesson. It’s called hello.exe. If you don’t have it, please go download it from the website right now.
Now, if you run the program you’ll see that it’s not very exciting. Hey, we
don’t need exciting programs to learn exciting things! I’ll repeat this many
times throughout the course—even the stupidest, buggiest, lamest application can hold some educational value. If you haven’t run the program, please do so now…
If you have some experience with Windows® programming you’ll probably
realize that this program uses a dialog box. More specifically, it’s using something called a Message Box. If you knew that already, give yourself a
Scooby snack and move ahead a few paragraphs.
There are basically three types of programs you’ll encounter in Windows®.
The first uses what’s called the Single Document Interface (SDI).
An example of this would be good ol’ Notepad. For most users (unless you’ve really played around with your color settings) the background of the Notepad application is white. It has a menu at the top, a title bar above that, and has the usual minimize, maximize and close buttons. The key feature of an SDI application is that you can only open one (1) document at a time. If you try to use Notepad to open a second file, it will close the first (after giving you a chance to save any changes you’ve made).
The second type of program you’ll encounter uses the Multiple Document Interface (MDI).
Examples of MDI applications include Word® and Photoshop ®. These programs are more complex. They usually have more menus (although this is not a rule) and other bells and whistles. The key feature of an MDI application is that it can have more than one document open at a time. In Word®, you can have 3 or 4 different files open at the same time, either cutting and pasting from one to the next, or maybe just multitasking
MDI applications are probably the most complicated to write of the three
types, but you’ll see that sometimes they’re just as easy to reverse as any
other application.
The third program type is usually called a Dialog Based Application.
Dialog based applications are usually the easiest programs to write. However, this doesn’t mean that the programs are any simpler than those in the other two
categories. Many businesses have key applications that are totally dialog
based. A dialog based program usually has a background that’s a sort of
greyish color (again assuming you haven’t changed your color settings). They’re pretty easy to spot and you’ll be seeing a lot of them throughout this course.
***** Those of you that skipped ahead can join the rest of us here *****
Why is it so important to know about the different types of programs that run on Windows®? Because each type of application has certain pieces of code that are unique and we can use that to our advantage. More on that in a later lesson.
Ok, so we’re looking at this funky window (message box) on our screen. Take note of everything you see. Notice that the window says: “Hello there!” Did you take the time to say hello back?? It’s polite to respond you know! The window also says: “Let’s see how this thing works!” and it’s title is “Disassemble me!!!”
Well, what are you waiting for? Instructions? Oh…..ok.
Close the window (press the close button ‘X’ or the ‘OK’ button). Now, run
W32DAsm. You did get it didn’t you? If you don’t have it or it’s not installed, go do it now.
===============================================
===============================================
===============================================
All set? On the Disassembler menu, choose ‘Open File to Disassemble’.
Now, select the hello.exe file where ever you put it on your machine. This is a very small program, so you should get results almost immediately. Larger programs can take up to 5 minutes or so to disassemble. Just warning you now…
If everything has gone according to plan you’ve got a screen full of text which probably doesn’t make a lot of sense. That’s ok…you don’t need to
understand all of this material right now. We’ll introduce the different pieces as we need them in future lessons.
Now, before we go any further, let’s save what we’ve got. Saving creates two files.
The first is a text file that contains the disassembled code.
The other is a project file which (among other things) links the disassembled text file with the original application. By saving you avoid having to re-disassemble a program every time you run W32DAsm—a time saver on those big programs for sure! To save, choose ‘Save Disassembly Text File…” from the Disassembler menu. Accept the name that’s provided and press ‘OK’.
Now that we’ve saved, we can start looking at a few aspects of W32DAsm that we can use. The first feature to check out is the Functions menu. Under this menu are two sub-menus Imports and Exports. In the case of hello.exe, exports is greyed out which means that no exports exist.
To give a brief explaination, imports are pieces of code (called functions) that the program uses that don’t belong to it. Usually the imports are coming from Windows® .dll files (you’ve seen them before, right?) but they could come from other .dll files as well. Exports are functions that this file is willing to share with the outside world (other applications). If this were a .dll file, you would expect to find exports. Since it’s an .exe file, and a simple one at that, it’s no big surprise that there are no exports.
Let’s take a look at the imported functions. Select Imports from the menu.
You should get a Dialog Box with some buttons and a large text field containing the following two lines:
| Code: |
KERNEL32.ExitProcess USER32.MessageBoxA |
These two lines are very important. They tell us a great deal about how this program works. Each import is divided into two parts, separated by a period.
The first part of each import is the .dll file from which it comes.
The second part is the name of the function.
So, hello.exe imports two functions: MessageBoxA which comes from User32.dll and ExitProcess coming from Kernel32. dll. In most full-fledged applications the list of imports will be quite long. You’ll usually see imports from KERNEL32 and USER32 as well as from a few other .dll files.
If you want to know more about a specific function that’s being imported (and I hope you do) you can usually find the info you need in the Windows® API Reference. In the last lesson I recommended you find the electronic, “help file” version. If you’ve got it, fire it up now.
When the help file opens, click the ‘Index’ tab at the top and then type: Exit-Process. As you type, the list below should change until you see the entry ExitProcess. Double-click it and you’ll be brought to the info page on this function. There’s a lot of hyper-linked info here; again, more than we need right now. I’ll highlight a few parts.
If you click the Quick Info button at the top you’ll get a little pop-up window. The pop-up tells you which OS’s the function works on, the import library it comes from, and the header file associated with it. We already knew that ExitProcess came from Kernel32. The header file will be covered in a later lesson.
Next, at the top of the description is a brief summary. It says: The ExitProcess function ends a process and all its threads. As further explanation (also taken from the help file):
An application written for Microsoft® Windows® consists of one or more processes. A process, in the simplest terms, is an executing program. One or more threads run in the context of the process. A thread is the basic unit to which the operating system allocates processor time. A thread can execute any part of the process code, including parts currently
being executed by another thread.
So, basically, this function ends the application. It’s that simple. So, we can now see where in the code the computer is told to stop running the application.
We’ve got our first foothold in the application.
The next import is MessageBoxA. If you try to search for this in the API reference you won’t find it. However, you will find two similar functions: Message-Box and MessageBoxEx. There are slight differences between the versions.
For all practical purposes, however, we can use the version with no suffix and feel confident that it will be the same as the –A version. Open this information now.
MessageBox—The MessageBox function creates, displays, and operates a message box. The message box contains an application-defined message and title, plus any combination of predefined icons and push buttons.
Hmm…you can probably figure out that this is how the message window for this program has been created. This is a key piece of information to know! Let’s look at the information in the API reference a little more closely.
| Code: |
int MessageBox( HWND hWnd, // handle of owner window LPCTSTR lpText, // address of text in message box LPCTSTR lpCaption, // address of title of message box UINT uType // style of message box ); Parameters
hWnd Identifies the owner window of the message box to be created. If this parameter is NULL, the message box has no owner window.
lpText Points to a null-terminated string containing the message to be displayed.
lpCaption Points to a null-terminated string used for the dialog box title. If this parameter is NULL, the default title Error is used.
uType Specifies a set of bit flags that determine the contents and behavior of the dialog box. This parameter can be a combination of flags from the following groups of flags. |
This part of the reference defines the function. Basically, a function is a piece of code that exists to do a job. Sometimes it can do that job without any information.
Other times, a function may need a little help. In this case, Message-Box needs some information before it can do its job. There are four pieces of information it needs: hWnd, lpText, lpCaption and uType. Each of these four items is called a parameter. There is a (sort of) detailed description of each parameter in the reference.
Basically, hWnd is a number that identifies another window (in this program there is no other window so hWnd is NULL). lpText is the memory address of the text that is shown in the message box
that’s created.
lpCaption is the memory address of the title bar text, and uType gives more information about the appearance of the window (if we need it…otherwise it’s NULL). These four parameters give you a lot of flexibility with message boxes. We’ll explore them later on.
Using a process similar to what we’ve just done, looking up imports in the API reference can tell you quite a bit about how a program runs. Take a few small applications (Notepad is a good choice), disassemble them and check out some of their API functions. You’ll probably find quite a few more than in our hello.exe program.
Alrighty! We’ll come back to those imports and specifically the MessageBox prototype a little later on in this lesson. Right now let’s look at one more menu in W32DAsm. Click on the Refs menu You should see three menu choices, two of which are grayed out: Menu References, Dialog References, and String Data References. Since our program uses no menus, it makes sense that this choice should be grayed out. The reason Dialog References is grayed out may be a bit
more confusing.
Technically this program is using one dialog box, created by the MessageBox API function. However, the Dialog References menu option refers to dialog’s that have been created by the program’s author and compiled into the program.
If you’ve ever used Visual C++, Visual Basic, Delphi or any other graphical RDE (Rapid Development Environment) you understand what this means.
You can drag and drop text boxes, list boxes, radio buttons, command buttons, etc and place them on the dialog box exactly how you’d like them to appear to the user. Our program’s little window wasn’t created this way, it’s totally created by the code. As a matter of fact, no RDE was used to create this pro-gram. This is why Dialog References is grayed out. These two menu options aren’t used that often anyway, so it’s no big loss here.
The last menu option, String Data References is often the first place to snoop when you’re checking out a program, especially if you have a specific objective in mind (hmmm…
)
If you click this menu option right now, you’ll see a dialog box appear that has just one line in it…”Hello There!” That’s the first line that appears in our little window when we run the program!!! Cool!
Quite often you can find a specific string that appears in a program by selecting this menu option. However, be careful. This isn’t always the case. For example, you’ll notice that the line “Let’s see how this thing works!” is missing.
Additionally, the title of the window, “Disassemble me!!!” doesn’t appear. So, this isn’t a foolproof method for finding a string that appears in a program, but it’s useful. For example, I often find myself searching for strings like “Demo Version”, “Full Version”, “You have %d days left”, “Thank you for registering” and “Invalid serial #”, among others
You’ll get quite a bit of practice using this menu option in the next few lessons, so don’t worry if you don’t completely understand it yet. I do want to show you one other feature of the menu that’s not readily apparent. If you closed the window that shows the String Data references, open it again from the menu.
You should still see just one string, “Hello There!”. Double click it. The window behind should have changed. Close the “List of String Data” dialog and look at this mess we’ve got going on.
Diving into the Source Code
—————————————————————————–
At the top of the main window you should see a cyan bar highlighting a line that looks like this:
| Code: |
| :0041007 6800304000 push 00403000 |
This line is highlighted because of the double click you just did in the String Data references dialog. Basically, as we’ll discuss in a moment, this line refers to the text string “Hello There!”.
(Note: you could also have double clicked in the Imports dialog to go to the line(s) that refer to an import. Remember, however, that in a normal program a function like MessageBox will probably be called TONS of times. Each double-click will take you to another location in the program. This way, you can scroll through all of the locations where a specific import is used. Ok…back to our regularly scheduled program…)
If you scroll up six lines you should be able to see this on your screen:
| Code: |
//******************** Program Entry Point ******** :00401000 6A00 push 00000000
* Possible StringData Ref from Data Obj ->”Hello There!” | :00401007 6800304000 push 00403000 :0040100C 6A00 push 00000000
* Reference To: USER32.MessageBoxA, Ord:0195h | :0040100E E807000000 Call 0040101A :00401013 6A00 push 00000000
* Reference To: KERNEL32.ExitProcess, Ord:006Bh | :00401015 E806000000 Call 00401020 :00401002 682E304000 push 0040302E |
Believe it or not, that’s it! That’s the whole program. Well, that’s all of the program’s executable code. There’s other stuff in the file, like the text strings that appear in our window, but this is the guts of the program. Why don’t we take it line by line, shall we?
//******************** Program Entry Point ********
This is put here by W32DAsm. It’s just meant to mark the spot where the program begins execution. In a future lesson you’ll learn how this is determined.
Changing this entry point is how some virii attach themselves to programs and then get executed so they can spread or cause their mischief.
:00401000 6A00 push 00000000
This is the first actual line of code in the program.
Each line of actual code begins with a colon followed by an eight-digit number. Please be aware that this eight-digit number is in hexadecimal, not decimal
The first number is the memory location of this line of code.
Each line of code is loaded into memory when the program is executed; the computer’s CPU then moves from line to line executing the commands as it encounters them (this is actually an oversimplification due to multi-threading and multiple process spaces, but we can forget about this most of the time.) So, we’ve got our first line of code at memory location 0x00401000 (the 0x means our number is hexadecimal).
The second column is the hexadecimal Op-code.This is the numbers that the computer actually reads and understands. Humans don’t usually program in op-code (Machine Language). Instead, we use Assembly Language that uses pseudo-English looking symbols. That’s what the 3rd column is all about. Basically, 0×6A00 to the machine is the same as push 00000000 to the assembly language programmer.
Now, a few words are in order here to understand what we’re about to go through. In most programming languages you have a construct called a variable. Variables hold all of the information that your program uses: the name of a videotape, the number of times a customer has called to complain, the price of tea in China, whatever. It could be a number, letter or a string of letters.
No matter what the variable is holding, there is one thing they all have in common—where they hold this information.
All variables are stored in memory. There are two major sections of memory (again, major simplification here) the stack and the heap.
Variables that a programmer declares and uses in his or her program are stored on the heap.
This is a section of memory that is set aside just for this program and these variables. It’s sort of like a locker at the gym. You use it for a while, and you may even put a padlock on it. Then, when you leave, you remove the padlock and take your stuff out (hopefully cleaning all of it out). Now somebody else can use that locker. Memory in Windows works like that. Each program has a space all to itself. Other programs are expected to keep their mitts out of this space. When a program is shutdown, the space is now free for the taking. In the old days of Windows 3.1, cleaning out the “locker space” was a big problem.
Too often programs would shutdown and forget to “remove the padlock”. This would cause memory leaks. The only way to get that memory back was to restart the computer. Very bad behavior! Fortunately, those days are gone.
Now, the other portion of memory I mentioned is the stack. Now, the stack is totally different than the heap. The stack is a section of memory that is shared by all of the programs currently running (including the OS itself). It’s a very complicated section of memory that can cause all sorts of problems if you mess with it! However, it’s integral to understanding assembly language programming.
Each time that a Windows API is called, and that API has parameters that need to be passed to it, they are passed on the stack. The stack has a behavior that is very similar to its physical namesake. When you put a card onto a stack, you usually put it on top. When you take off a card, you remove the last one that was placed on the stack (the one on top).
This behavior is called Last In—First Out (LIFO). In assembly language, putting something onto the top of the stack is called pushing onto the stack. Removing something is called popping it off the stack.
So, basically, this line of code is pushing 0x00000000 (or just 0) onto the
stack. Why? Well…you’ll see in a second, ok?
:00401002 682E304000 push 0040302E
Ok. This next line of code is stored in memory at 0x00401002. The op-code is 682E304000. This op-code stands for the assembly language statement PUSH 0040302E, which means push the value 0x0040302E onto the top of the stack (again, reasons will be explained in a moment.)
Take a look at the op-code for a moment:682E304000.
The 68 is the actual code for the command PUSH. The next set of numbers (called an operand) represents the value to be pushed: 2E304000. This doesn’t seem to agree with what I just told you a moment ago does it? Well, see, the Intel Chip is a funny thing. It doesn’t like numbers to be “in order”. It uses what’s called little-endian byte order. Basically, this is a fancy way to say it’s written backwards. A byte is a location in the computer’s memory that can hold a number from 0-255. In hexadecimal this is written 0×0—0xFF. In other words, a byte is represented by 2 digits in hexadecimal. So, if we reverse the number 2E304000, keeping each 2 digit byte together, we get 0040302E don’t we? And this is the number that was pushed onto the stack.
Now, there’s one other strange thing that’s appeared in the first two lines of this program. Both of the lines we’ve examined were PUSH statements. However, the op-codes were different. Kudos to you if you noticed that. The first statement, PUSH 00000000, was 6A00. This would imply that 6A is the opcode for PUSH. But he second line uses 68 as the op-code for PUSH. What gives?
The answer is that there are different op-codes for the same command. They vary based upon the length of the operand that follows the op-code itself. Notice that there is only 1 byte following the op-code 6A (PUSH 1 byte). There are 4 bytes following the code 68 (PUSH) 4 bytes). When you don’t need to store 4 bytes (like when you’re pushing 0) you can save space in your .EXE by using smaller op-code operands. Most compilers do this sort of optimization for you, so you don’t need to think about it. The point here is that op-codes are a more difficult than assembly language commands to work with.
Let’s blast through a couple more lines:
* Possible StringData Ref from Data Obj ->”Hello There!”
|
:00401007 6800304000 push 00403000
:0040100C 6A00 push 00000000
The line that begins * Possible… has been placed there by W32DAsm. Basically, what it’s saying is, “Hey…I think this next line might be talking about the String Data reference, ‘Hello There!’” Remember that this was the line that was highlighted when you double clicked this string in the String Data dialog box?
This is the reason why.
W32DAsm doesn’t just decompile the file into assembly language. It tries to draw connections between the different parts of the program.
The instruction stored at memory address 0x00401007 says push 0x00403000 onto the stack. Take a moment to look at the op-code version and make sure you see how this number is written in little-endian byte order.
So, the decompiler is telling you it believes that the number 0x00403000 is somehow related to the string “Hello There!”. It’s trying to help you convert these strange numbers into stuff you’d understand better. In a moment you’ll see why this number represents that string. First, look at the last line above.
It’s another push statement, this time pushing 0 again. And notice that the opcode is again 6A instead of 68, representing a 1 byte operand.
Ok, another quick detour. Open up the ‘HexData’ menu, and select the ‘Hex Display of Data Objects/Segments’ option. A dialog box should open that has
memory addresses down the left side, starting with 00403000, eight hexadecimal bytes in the middle, and 8 ASCII characters on the right. Basically, what you’re looking at here is a dump of a section of the program’s memory.
You’re getting to see what character is stored at each and every location in that section of memory.
Notice that starting at memory location 0x00403000 the characters spell out “Hello There!” After the exclamation mark are two periods. These are not actually periods. These are characters that the computer can’t display. Look at the hexadecimal values for those two points: 0×0d and 0×0a. These stand for ‘carriage return’ and ‘line feed’ respectively. These combine to basically cause the text to drop down to the next line on the screen. A string doesn’t actually end until it gets to a null-terminating character, which in this case is 0×00. If you follow the characters down the list you’ll see that the first 0×00 occurs after the work ’works’ and it’s exclamation mark. Thus, the full string that we get at memory location 0x00403000 is:
Hello There!
Let’s see how this thing works!
That’s EXACTLY what appears in the dialog box isn’t it? How cool is that?
Also notice that the next string is “Disassemble me!!!” That’s the title of our window! And what’s the address of the first letter (D)? It’s 0x0040302e.
Does that number seem familiar? I hope so…
By now you’re wondering what all these pushes have to do with anything. Patience Grasshopper—all will be revealed in a moment. Let’s recap what’s happened here first. The program has instructed the computer to push four values onto the stack: 0×0, 0x0040302e, 0x00403000, and 0×0. Now the stack is always going to hold 4 byte values, so it expands the 0×0s into 0x00000000. This results in the stack looking like this:
Does that number seem familiar? I hope so…
By now you’re wondering what all these pushes have to do with anything. Patience Grasshopper—all will be revealed in a moment. Let’s recap what’s happened here first. The program has instructed the computer to push four values onto the stack: 0×0, 0x0040302e, 0x00403000, and 0×0. Now the stack is always going to hold 4 byte values, so it expands the 0×0s into 0x00000000. This results in the stack looking like this:
| Code: |
0x00000000 Last in / First out 0x00403000 | 0x0040302e | 0x00000000 First in / Last out |
Now, look at the next line in the program!
* Reference To: USER32.MessageBoxA, Ord:0195h
|
:0040100E E807000000 Call 0040101A
By now I hope you realize that the * Reference To… is a comment made by W32DAsm. It’s telling you a lot about the next line, which says Call 0040101A. Again, the software is converting this number into something you can work with. At line 0x0040100e the program is telling the computer to call the MessageBoxA function (which it’s imported from User32.dll).
If you go back and look at the API info on MessageBox you’ll recall that it requires four parameters be sent to it. The way the program does this is by (drum roll please)……… pushing them onto the stack. That’s what we’ve been looking at! Those first four lines were basically just getting ready to send information to the MessageBox function. They were pushed in reverse order so that when the MessageBox function goes to pop them back off, they’ll be in the correct order. Let’s go through the pops one at a time:
1st—MessageBox wants to know hWnd, the handle of the owner window. Since there is no other window the program sends 0x00000000 which is the numerical value for NULL.
2nd—MessageBox wants to know lpText which is the text to display in the window. The program has sent along 0x00403000, which is the memory address of the start of the string “Hello There!…”.
3rd—MessageBox wants to know lpCaption which is the text to display in the title bar of the window. The program has send along 0x0040302e, which is the first letter of the string “Disassemble me!!!”.
4th—MessageBox wants to know what other information to display in the window (special buttons, icons, etc). The program sends 0x00000000, another NULL, basically saying, “No thank you…nothing fancy for me!”
Once MessageBox pops these four parameters off of the stack, it does it’s thing and the result is the box that we get to see. Now, the program doesn’t move on to execute it’s next line until the user clicks the OK button in the Message Box. This is because User32.dll has taken control of this program’s execution and hasn’t returned control yet. Once OK is clicked, User32.dll releases control of the CPU and program execution continues with the next line.
:00401013 6A00 push 00000000
* Reference To: KERNEL32.ExitProcess, Ord:006Bh
|
:00401015 E806000000 Call 00401020
Two lines left!!! The first is another PUSH. Hmm….and the next is a call to Exit-Process. Does ExitProcess require any parameters? Yep…..one! What is the parameter? It’s an exit code. It’s not important here, so the program just sends 0 (NULL). Kernel32.dll takes the parameter off of the stack and uses it to execute the ExitProcess function, which results in the program exiting. End of program!
That’s it…that’s the whole program.
Now, you may not feel like you’ve achieved a great deal, but there have been several things you’ve been introduced to today:
* Using W32DAsm to disassemble a file, and then saving the work so you
don’t have to do it again.
* Finding out what functions a program (or .dll file) imports and exports.
* Understanding how to find out information about an import or export using the Win32API reference.
* Using W32DAsm to pin down where certain text strings that appear in a
program are used in the program itself.
* Walking through the source code of a fully working Windows® program.
* Understanding a little about assembly language programming and two commands: PUSH and CALL.
* Understanding how numbers need to be stored in little-endian byte order
on an Intel® CPU executable.
===============================================
Exercises
Here are a few short exercises that you should try to complete. Send your answers to your mentor and they will check your work and respond appropriately.
1) Disassemble Notepad and answer the following questions:
a) What function does Notepad use to display short messages in a dialog box?
b) How many times is the function in part (a) called within the Notepad application?
c) At what memory location does the program begin execution?
d) What function do you think Notepad uses to close the application?
eT’s Reversing School
by evilTeach
2) Disassemble W32DAsm (did you already think to try this?) and answer the following questions:
a) What function does W32DAsm use to display short messages in a diablog box?
b) How many times is the function in part (a) called within the applicatons.
c) At what memory location does the program begin execution?
d) What function do you think W32DAsm uses to close the application?
e) At what memory address is the window with the message: “Close—Are You Sure?” created?
f) What is displayed in the title bar of the window from part (e)?