Platform-specific Strings: An Introduction to BSTR Strings

Let’s continue the discussion on string types available for C++ programmers, this time introducing a Windows-specific string type: the BSTR.

Continuing the previous enumeration of some available C++ string options, let’s focus this time on a Windows platform-specific type of string: BSTR. BSTR is a string type that is used in Windows programming by COM, OLE Automation, and even interop functions to pass strings between C++ native code and .NET managed code.

The acronym stands for Basic String, which is tied to its historical origins related to Visual Basic and OLE Automation.

So, what does this BSTR string type look like?

Well, it’s defined in the Windows SDK <wtypes.h> header like this (comments and annotations stripped out):

typedef OLECHAR *BSTR;

So, it’s a raw pointer to an OLECHAR. And, what is an OLECHAR? If you use Visual Studio, you can right click on the unknown OLECHAR symbol, and select “Go To Definition” in the context menu, or just press F12. This leads to:

typedef WCHAR OLECHAR;

So, OLECHAR is basically a typedef for a WCHAR. And, recursively, what is a WCHAR?

Well, same F12 trick, and you land on the following line in <winnt.h>:

typedef wchar_t WCHAR;    // wc,   16-bit UNICODE character

So, “unwinding” the above search stack, BSTR is basically a typedef to a wchar_t* raw pointer.

So, does it mean that you can happily use both BSTR and wchar_t* C-style strings interchangeably in your C++ code?

No way! Unless you want to introduce nasty bugs in your code.

In fact, in addition to its raw typedef definition, a BSTR has a well-defined memory layout, and allocation and deallocation requirements.

Allocating and Freeing Memory for BSTRs

In particular, a BSTR must be allocated using COM specific memory allocation functions, like SysAllocString (and similar functions like SysAllocStringLen, etc.).

BSTR myBstr = SysAllocString(L"Connie");

And, symmetrically, it must be freed using SysFreeString:

SysFreeString(myBstr);
myBstr = nullptr; // avoid dangling pointers

In other words, you can’t just invoke new[] and delete[] or malloc and free to allocate and release memory for your BSTR strings. Again, that would cause nasty bugs in your code!

As a side note: These BSTR functions (SysAllocString, SysFreeString, etc.) are declared in the <OleAuto.h> header, which reminds us of the historical connection between BSTR and OLE Automation.

The BSTR Memory Layout

Note also that, despite its simple type-system definition of wchar_t*, a BSTR has a well specified structure in computer memory.

In particular, considering the above “Connie” example, such a BSTR looks like this in memory:

The memory layout of a BSTR string.
Memory layout of a BSTR string
  1. There is a 4-byte (32-bit) header, before the wchar_t pointed by the BSTR pointer, that is a 32-bit integer that stores the number of bytes in the following string data.
  2. There is a contiguous sequence of wchar_t’s, that represents the string encoded in Unicode UTF-16.
  3. There is a wchar_t NUL-terminator, i.e., a sequence of two bytes (16 bits) cleared to zero: 0x0000.

As you can see, this is a concrete well-defined data structure associated to the BSTR pointer, with a length-prefix header, followed by the sequence of wchar_t’s, and a 2-byte NUL terminator. This is very different from a simple C-style string, like:

const wchar_t* str = L"Connie";

Note that the string length in the BSTR header is expressed in bytes. Let’s do some simple math: “Connie” has six characters; each character is a wchar_t, which occupies two bytes; so, 6 [wchar_t] * 2 [byte/wchar_t] = 12 bytes. So, 12 is the number of bytes occupied by the string data; this length value is stored in the BSTR header.

Note also that the terminating NUL is not counted in the length prefix.

Another important feature of BSTR is that a NULL (or nullptr) BSTR is equivalent to an empty string. In other words, an empty string ”” and a NULL BSTR must have the same semantics.

Note also that the sequence of contiguous wchar_ts following the BSTR header can contain embedded NULs, as the BSTR is length-prefixed.

In addition, getting the length (expressed as count of wchar_ts) for a BSTR is a fast O(1) operation, as the string stores its length in the header. You can get the length of a BSTR invoking the SysStringLen API.

Side Note: Using a BSTR as a Binary Array

As an interesting side note, sometimes the BSTR structure is used as a generic byte-array data structure instead of a proper string. In other words, since it’s length-prefixed and can potentially contain embedded zeros, a generic sequence of bytes can be stuffed in the string data part of the BSTR structure. However, I have to say that I’d prefer using a proper data structure to store a binary byte sequence, for example, in the realm of OLE Automation: a SAFEARRAY storing 8-bit unsigned integers, instead of kind of “semantically stretching” a BSTR.

4 thoughts on “Platform-specific Strings: An Introduction to BSTR Strings”

  1. hello Could I please ask you for some help with BSTR ? I want to communicate in OLE with the Adobe AcroPDF object.

    Line 1 in my example effectively gives me a reference to the AcroPDF object

    PDF =@ wxautomation1.new(createinstance = “AcroPDF.PDF”, error=e)

    The documentation says

    VARIANT_BOOL LoadFile(BSTR fileName) to load a pdf file into the browser

    I want to open a document called “C:FRIGO_COMPTA35002.pdf” which I converted to unicode as below starting at 43 and finishing at 66. The 32 at the start is the length of the string in hexa ( 25 *2 = 50 = 32 hex and the 00 at the end is what I thought was needed to put a NULL at the end
    s = “32 43 3a 5c 46 52 49 47 4f 5f 43 4f 4d 50 54 41 5c 33 33 37 38 32 2e 70 64 66 00”

    PDFFile =@ PDF.callmethod(method = “LoadFile(s)”, error=e) I get no error but it doesn’t work. Could you possibly tell me where i am going wrong and eventually give me the BSTR I need to transmit. i admit to being a bit lost with the 4 byte integer required at the start.

    Thank you in advance for whayever you accept to do

    Ian Macpherson

    Like

  2. Hello and thank you for replying I have just one question

    I created a blob with the following code in my application and thought that AcroPDF.FileOpen( BSTR filename) should understand it but no go. I have seen other people saying it didn’t work on the Net though.

    b = .toblob(“{36}{00}{00}{00}{43}{3a}{5c}{46}{52}{49}{47}{4f}{5f}{43}{4f}{4d}”)
    b = b + .toblob(“{50}{54}{41}{5c}{33}{35}{30}{30}{32}{2e}{70}{64}{66}{00}”)

    Any comment please Ian Macpherson

    Like

    1. Note that you cannot just allocate a sequence of bytes in memory, fill it with some data, and assume that is a BSTR, and pass that to functions expecting BSTR parameters.
      A BSTR must be allocated using the SysAlloc family of functions, like SysAllocString (and freed with the corresponding function).

      Like

Leave a comment