Assembly – Giovanni Dicanio's Blog

Somebody has to pass a string from standard cross-platform C++ code to Windows-specific C++ code, in particular to a function that takes a BSTR string as input. For the sake of simplicity, assume that the Windows-specific function has a prototype like this:

void DoSomething(BSTR bstr)

(In real-world code, that could be a COM interface method, as well.)

The string to pass to DoSomething is stored in a std::wstring instance. The caller might have converted the original string that was encoded in Unicode UTF-8 to a Unicode UTF-16 string, and stored it in a std::wstring.

They pass the wstring to DoSomething, invoking the wstring::data method, like this:

std::wstring ws{ L"Connie is learning C++" };
DoSomething(ws.data());

The code compiles successfully. But when the DoSomething function processes the input string, a weird bug happens. To try to debug this code and figure out what’s going on, the programmer builds this code in debug mode with Visual Studio. Basically, what they observe is that the text stored in the string is output correctly, but when the string length is queried, the returned value is abnormally big.

The reported string length is 2,130,640,638. That is more than 2 billion! Of course, this string length value is completely out of scale for a string like “Connie is learning C++”.

This is a small repro code for the DoSomething function:

void DoSomething(BSTR bstr)
{
    // Printing a BSTR with std::wcout?
    // ...See the note at the end of the article.
    std::wcout << L" String: [" << bstr << L"]\n";

    // *** Get the length of the input BSTR string ***
    auto len = SysStringLen(bstr);
    std::wcout << L" String length: " << len << '\n';
}

And this is the output:

The input BSTR string is printed out correctly; but the string length is reported as 2+ billion characters. — The input BSTR is printed out correctly, but the string length is reported as 2+ billion characters!

What is going on here? What is the origin of this bug? Why is the input string printed out correctly, but the same string is reported as being more than 2 billion characters long?

Well, the key to figure out this bug is understanding that a BSTR is not just a raw C-style wchar_t* string pointer, but it’s a pointer to a more complex and well-defined data structure.

In particular, a BSTR has a length prefix.

To get the input string length, the DoSomething function invokes the SysStringLen API. This is correct, as the input string passed to DoSomething is a BSTR. And to get the length of a BSTR string, you don’t call CRT functions like strlen or its derivatives like wcslen; instead, you call proper BSTR API functions, like SysStringLen.

The length of a BSTR is a value written as a header, before the contiguous sequence of characters pointed to by the BSTR. So, what SysStringLen likely does, is adjusting the input BSTR pointer to read the length-prefix header, and return that value back to the caller. Basically, getting the string length of a BSTR is fast O(1) constant-time operation (much faster than a O(N) operation performed by strlen/wcslen).

So, why is this 2,130,640,638 (2B+) magic number returned as length?

A Bit of Assembly Language Comes to the Rescue

I started programming at direct hardware level on the Commodore 64 and Amiga, and I love assembly!

These days I think modern C and C++ compilers do a great job in producing high-quality assembly code. However, I think that being able to read some assembly can come in handy even in these modern days!

So, let’s take a look at some assembly code associated with the SysStringLen function:

The assembly source code of the SysStringLen API, shown in Visual Studio. — SysStringLen’s assembly code shown in Visual Studio

The first assembly instruction in SysStringLen is:

test rcx, rcx

Followed by a JE conditional jump:

je SysStringLen+0Ch

This assembly code is basically checking if the RCX register is zero.

In fact, the x86/AMD64 assembly TEST instruction performs a bitwise AND on two operands. In this case, the two operands are both the value stored in the RCX register. If RCX contains the value zero (i.e. all its bits are 0), the AND operation results in zero, too. In this case, the ZF (Zero Flag) is set. As a consequence of that, the following JE instruction jumps to the instruction located at SysStringLen+0Ch, that is:

xor eax, eax

This assembly instruction performs a XOR on the content of EAX with itself. The result of that is zeroing out the EAX register.

Then, the function returns with a RET instruction.

So, what is going on here?

Well, the RCX register contains the input BSTR pointer. So, this initial assembly code is basically checking for a NULL BSTR, and, if that’s the case, it returns 0 back to the caller. This is because a NULL BSTR is considered an empty string, whose length is zero.

So, the above assembly code is basically equivalent to the following C/C++ code:

if (bstr == NULL) {
    return 0;
}

But this is not the case for the input BSTR string we are considering!

So, if you execute step by step the above code, the JE jump is not taken, and the next assembly instruction that gets executed is:

mov eax, dword ptr [rcx-4]

This assembly code is basically taking the value stored in RCX, which is the input BSTR pointer. Then it adjusts the pointer to point 4 bytes backward. In this way, the pointer is pointing to the length-prefix header! So, the BSTR length value, stored in this header, is written into the EAX register.

This is basically equivalent to the following C/C++ code:

UINT len = *((UINT*)(((BYTE*)bstr) - 4));

The following instruction to be executed is:

shr eax, 1

This is a right shift of the EAX register by one bit. The result of that operation is dividing the value of EAX by two. This is basically equivalent to:

len /= 2;

Why does SysStringLen do that?

Well, the reason for that is because the string length is expressed in byte count in the header. But the returned length is expressed as count of wchar_ts. Since each wchar_t occupies two bytes in memory, you have to divide by two to convert from count in bytes to count in wchar_ts.

Finally, the RET instruction returns back to the calling code. So, when SysStringLen returns, the caller will find the BSTR string length, expressed as count of wchar_ts, in the EAX register.

Memory Analysis and the “No Man’s Land” Byte Sequence

Now you know what the assembly code of the SysStringLen does. But you may still ask: “Why that 2+ billion string length??”.

Well, the final piece of this puzzle is taking a look at the memory content when the SysStringLen function is invoked.

Remember that the DoSomething function expected a BSTR. The caller passed the content of a std::wstring, invoking wstring::data instead. If you take a look at the computer memory, the situation is something like what is shown below:

Memory layout of the BSTR string bug. Instead of the expected length prefix, there is a 0xFDFDFDFD guard sequence indicating "No Man's Land". — The memory layout of the BSTR string bug: expecting a length prefix that wasn’t there.

Before the wstring sequence of characters, there are some bytes filled with the 0xFD pattern. In particular, the 0xfdfdfdfd sequence is used to mark the area of memory before a debug heap allocated block. This is a kind of “guard” byte sequence, that delimits debug heap allocated memory blocks. This is sometimes referred to as “no man’s land”.

Of course, SysStringLen assumes that the passed string pointer represents a BSTR, and the bytes immediately before the pointed memory represent a valid length-prefix header. But in this case the pointer is a simple raw C-style string pointer (that is owned by the std::wstring object), not a full-fledged BSTR with a length prefix.

So, SysStringLen reads the 0xfdfdfdfd byte sequence, and interprets it as a BSTR length prefix.

Note that 0xfdfdfdfd expressed in base ten corresponds to the integer value 4,261,281,277. That’s something around 4 billion, not the circa-2 billion value we get here. So, why we do get 2 billion and not 4 billion?

Well, don’t forget the SHR (shift to the right) instruction towards the end of the SysStringLen assembly code! In fact, as already discussed, SysStringLen uses SHR to divide the initial length value by two, as the length is stored in the BSTR header as a count of bytes, but the value returned by SysStringLen is expressed as a count of wchar_ts.

So, start from the “no man’s land” byte sequence 0xfdfdfdfd. Right-shift it by one bit, making it 0x7efefefe. Convert that value back to decimal, and you get 2,130,640,638, which is the initial “magic value” returned as the string length! So: Mystery Solved! 😊

Side Note: Why Was the BSTR Printed Out Correctly with wcout?

That is a very good question! Well, when you pass the BSTR pointer to wcout, it is interpreted as a raw C-style string pointer to a NUL-terminated string (wchar_t*). Since the BSTR contains a NUL terminator at its end, things work correctly: wcout prints the various string characters, and stops at the terminating NUL.

However, note that we were kind of lucky. In fact, if the BSTR contained embedded NULs (which is possible, as a BSTR is length-prefixed), wcout would have stopped at the first NUL in the sequence; so, in that case, only the first part of the BSTR string would have been printed out.

Tag: Assembly

The Case of the Two Billion Characters Long String

A Bit of Assembly Language Comes to the Rescue

Memory Analysis and the “No Man’s Land” Byte Sequence

Side Note: Why Was the BSTR Printed Out Correctly with wcout?