Malware analysis

Complete Tour of PE and ELF: An Introduction

March 3, by Security Ninja

I have decided to come up with an end-to-end malware analysis course and even extend it to memory forensics and detecting APT’s. Though this might sound great, not everyone has the skills to deal generally with malware, and it requires a fair bit of understanding how malware works behind the scenes. Two of the most important things to know before start analyzing malware are to understand PE and ELF file structure, and another one is to have a good knowledge of Assembly Language. I am starting with the PE and ELF from this series, and it will extend to some more articles which will be followed by Assembly Language. After that, I will start with performing static malware analysis, dynamic malware analysis followed by memory forensics and dealing with APT’s. So without wasting any time, let’s start with PE structure.

So if you want to know what you will be dealing with here take a look at this link.

Do not worry it not that bad as it looks like and we will cover only those portions which are of importance. I will also try to demonstrate each section with the help of an example. I will be using PE view and COFF explorer to dig into PE files.

Portable Executable (PE) is an executable format for window. Common windows PE file extensions are:

Before we examine the first structure, it is important to note number appear will be stored in little-endian format. For example, hex 0x0123 will be 32 01 here.

PF very first structure is of 64 bytes and is IMAGE_DOS_HEADER

Here two fields are of most importance:

Between these two fields is a DOS stub program which prints “This program cannot be run in DOS mode.”

Taking the value of file offset from e_lfanew from IMAGE_DOS_HEADER we will map the new structure known as IMAGE_NT_HEADERS

This structure contains:

Here VirtualAddress is the Relative Virtual Address for some other structures like Import, Export, etc. We will talk about this in great detail later on.

Sections

As mentioned earlier, PE consists of sections which is a way to organized data like what sort of data goes where. For example, code gets placed in the .text section, read-only data goes to .rdata section, global data goes to .data section, etc. Below is the structure for each such section. Notice that it has a Union embedded into it


Important fields we care about in this are:

  • Name: Name of Section stored in a byte array of ASCII characters.
  • VirtualSize: This will be referenced to as misc.VirtualSize since it within a union. This tells us that the size of this section in memory.
  • VirtualAdress: This is the RVA w.r.t to OptionalHeader.ImageBase(Remember the Image Base field in Optional Header Structure explained earlier). This is the offset we are talking about in memory.
  • SizeofRawData: This depicts size of raw data on disk whose beginning is pointed by PointerToRawData
  • PointerToRawData: This is the relative offset from the beginning of the file. Remember this is the offset we are talking about on disk
  • Characteristics: This tells us about whether the section is readable/writable/executable. For example .rdata will have READ, WRITE flag set only by default. Also, it tells us whether the section contains any initialized data. IMAGE_SCN_MEM_NOT_CACHED field in this section tells us if this section can be cached.There is another field IMAGE_SCN_MEM_NOT_PAGED which tells whether this field can be paged or not.
  • Important Note: If you examine the files sometimes you will see VirtualSize SizeOfRawData. What? How is that possible? Sometimes on the code, there are some uninitialized variables which hold no space in the disk, but they get mapped into the .bss section on memory. And this .bss section gets merged with other sections like .data thus increasing misc.virtualSize. To add more to your confusion sometimes you will also see VirtualSize SizeOfRawData. Remember FileAlignment field we discussed earlier, so the code will be aligned in 0xoffset and thus will sometimes include padding making the VirtualSize SizeOfRawData.

    Also, note to calculate where the section header will start to sum up SizeOFOptionalHeader + Starting offset of Optional_header.

    Here is a list of Common Section Names:

    So we have covered good detail of PE as you can see below

    In the next article, we will look at the remaining sections.

    Become a certified reverse engineer!

    Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst. Start Learning

    References

    http://www.openrce.org/reference_library/files/reference/PE%rmat.pdf