Comparing STL vs. ATL/MFC String Usage at the Windows API Boundaries

A comparison between the worlds of STL vs. ATL/MFC string usage at the Windows API boundaries. Plus a small suggestion to improve C++ standard library strings.

In previous articles we saw some options for using STL strings and ATL/MFC CString at the Windows API boundaries. Let’s do a quick refresher and comparison between these two “worlds”.

For the sake of simplicity, let’s assume Unicode builds (which have been the default since VS 2005) and consider std::wstring for the STL side of the comparison.

The Input String Case

When passing strings as input parameters to Windows API C-interface functions, you can invoke the c_str method for STL std::wstring instances; on the other side, you can just pass CString instances, as CString implements an implicit C-style string pointer conversion operator, that will be automatically invoked by the compiler. At first sight, it seems that the CString approach is simpler (i.e. just pass the CString object), although in modern C++ there is a propensity to avoid implicit conversions, so the explicit call to c_str required by STL strings sounds safer. (Anyway, if you prefer explicit method invocations, CString offers a GetString method, as well.)

The Output String Case

Using an External Temporary Buffer – Both STL strings and ATL/MFC CString have constructor overloads that take an input pointer to a raw character buffer that is assumed to be null-terminated, and can build string objects from the content of that raw C-style null-terminated character buffer. This means that you can create an external temporary character buffer, pass a pointer to it as output string parameter to the C-interface Windows API you want to invoke, and then build the result string object (both STL wstring and ATL/MFC CString) using a pointer to that external intermediate buffer. In addition, an explicit buffer length can be passed together with the pointer to the beginning of the buffer, in case you want or need to explicitly pass the string length, and not relying on the null terminator.

Working In-Place – For both STL strings and ATL/MFC CString it’s possible to work with an internal buffer. This can be allocated using the resize method for STL strings, and then can be accessed via the non-const pointer returned by the data method invoked for the same string. If the returned string is shorter than the allocated buffer length, you have to find the position of the null-terminator scribbled in by the invoked Windows API, and call the STL string’s resize method once again to set the proper size (“length”) of the result string object.

On the other hand, with CString you can use the GetBuffer/ReleaseBuffer method pair: You can allocate the internal CString buffer specifying a proper (minimum) size invoking GetBuffer, then pass the pointer it returns on success to Windows C-interface APIs, and finally invoke CString::ReleaseBuffer to let the CString object update its internal state to properly store the null-terminated string written by the called function into the provided buffer.

Summary Table of the Various Cases

The following table summarizes the various cases discussed so far in a compact form:

STL vs. ATL/MFC string usage at the Windows API boundaries – Summary table

I think that for the “working in-place” output sub-case, CString is more convenient than STL strings, as:

  1. You don’t have to specify any initial value for filling the buffer allocated with GetBuffer; on the other hand, with STL strings you must specify some initial value to fill the string buffer when you invoke the string’s resize method (or equivalently the string constructor that takes a count of characters to repeat). So the CString::GetBuffer method is also likely more efficient, as it doesn’t need to fill the allocated buffer (at least in release builds).
  2. It’s possible to allocate a larger-than-needed buffer with GetBuffer (all in all you pass the safe minimum required buffer length to this method), then have the Windows API function write a shorter null-terminated string in that buffer. The ReleaseBuffer method will automatically scan the buffer content for the string’s null-terminator, and will properly update the CString object internal state (e.g. the string length) in accordance to that. This nice feature (scan until the null-terminator and properly set the string size) is not available with STL strings, as there is no such thing as a resize_until_null method.

A Small Suggestion for Improving STL Strings Interoperability with C-Interface Functions (Including Windows APIs)

So, here’s as a small suggestion for improving the C++ standard library strings: It would be nice to have something like get_buffer and release_buffer methods available for STL strings, following the same semantics of CString’s GetBuffer and ReleaseBuffer methods, with:

1. No need to specify an initial character to fill the STL string object when the internal buffer is allocated.

2. Automatically set the size of the final string object based on the null-terminator written into the internal buffer.

How to Use ATL/MFC CString at the Windows API Boundaries

Let’s discuss how CString C++ objects can be used at the boundaries of C-interface Windows APIs, considering both the input and output string cases.

We have two cases to consider: Let’s start with the input read-only string case (which is the easiest one).

The Input String Case

If you have a CString instance and you want to pass it to a Windows API function that expects an input read-only string, you can simply pass the CString instance itself! It’s as simple as this:

//
// *** Input string case ***
// 

// Some string to pass as input parameter
CString myString = TEXT("Connie"); 

// Just pass the CString object to the Windows API function
// that expects an input string, like e.g. SetWindowText
DoSomething(myString);

If you compile your C++ code in Unicode (UTF-16) build mode (which has been the default since Visual Studio 2005): CString is a wchar_t-based string, and the input string parameter of the Windows API function that follows the TCHAR model is typically a PCWSTR, which is basically a null-terminated const wchar_t* C-style string pointer. In this case CString offers an implicit conversion operator to PCWSTR, so you can simply pass the CString object, and the C++ compiler will automatically invoke the PCWSTR conversion operator to pass the given string as read-only string input parameter.

The Output String Case

Now let’s consider the output case, i.e. you are invoking a Windows API function that expects an output string parameter. Usually, in C-interface Windows API functions this is represented by a non-const pointer to a raw C-style character array, in particular a wchar_t* (or PWSTR) in the Unicode UTF-16 form.

You (i.e. the caller) pass a non-const pointer to a writable wchar_t buffer, and the Windows API function will fill this caller-provided buffer with proper characters, including a terminating NUL. This is all C stuff, but at the end of the process what you really want is the result string to be stored in a C++ CString object. So, how can you bridge these two worlds of C-style null-terminated string pointers and C++ CString class instances?

Option 1: Using an Intermediate Buffer

Well, one option would be to allocate an intermediary character buffer, then let the Windows API function fill it with its own result text, and finally create a CString object initialized with the content of that external intermediary buffer. For example:

//
// The Output String Case -- Using a Temporary Buffer
//

// 1. Allocate a temporary buffer to pass to the Windows API function
auto buffer = std::make_unique< wchar_t[] >(bufferLength);

// 2. Call the Windows API function, passing the address of the temporary buffer.
// The Windows API function will write its string into that buffer,
// including a NUL terminator, as per usual C string convention.
GetSomeText(
    buffer.get(), // <-- starting address of the output buffer
    bufferLength, // <-- size of the buffer (e.g. in wchar_t's)
    /* ... other parameters ... */
);

// 3. Create a CString object initialized with the NUL-terminated
// string stored in the temporary buffer previously allocated
CString result(buffer.get());

// NOTE: Since the temporary buffer was created with make_unique,
// it will be *automatically* released.
// Thank you C++ destructors!

Option 2: Working In-place with the CString’s Own Buffer

In addition to that, instead of creating an external temporary buffer, passing its address to the Windows API function to let it write the output string, and then copying the result string into a CString object, CString offers another option.

In fact, you can request CString objects to allocate some internal memory, and get write access to that. In this way, you can directly pass the address of that CString’s own internal memory to the desired Windows API function, and let it write the result string directly inside the CString’s internal buffer, without the need of creating an external intermediary buffer, and making an additional string copy (from the temporary buffer to the CString object).

The method that CString offers to allocate an internal character buffer is GetBuffer. You can specify to it the minimum required buffer length. On success, this method returns a non-const pointer to the beginning of the buffer memory, which you can pass to the Windows API function expecting an output string parameter.

Then, after the called function has written its result string into the provided buffer (including the NUL-terminator), you can invoke the CString::ReleaseBuffer method to let the CString object update its internal state to properly store the NUL-terminated string previously written in its own buffer.

In C++ code, this process looks like that:

//
// The Output String Case -- Using CString's Internal Buffer
//

// This CString object will store the output result string
CString result;

// Allocate a CString buffer of proper size, 
// and return a _non-const_ pointer to it
wchar_t* pBuffer = result.GetBuffer(bufferLength);

// Pass the pointer to the internal CString buffer
// and the buffer length to the Windows API function
// expecting an output string parameter
GetSomeText(
    pBuffer,      // <-- starting address of the buffer
    bufferLength, // <-- size of the buffer (e.g. in wchar_t's)
    /* ... other parameters ... */
);

// We assume that the Windows API function has written
// a properly NUL-terminated string into the provided buffer.
// So we can invoke CString::ReleaseBuffer to update
// the CString object's internal state, 
// and release control of the buffer.
result.ReleaseBuffer();

// It's good practice to clear the buffer pointer to avoid subtle bugs
// caused by referencing it after the ReleaseBuffer call
pBuffer = nullptr;

// Now you can happily use the CString result object in your code!

As you can note, in this second case, working in-place with the CString’s internal buffer, the output string characters are written only once inside the CString object, instead of being first written to an external intermediate buffer, and then being copied inside the result CString object.

How to Load CString from Resources

Very easy! It’s just a matter of invoking a constructor with a type cast.

Last time we saw how to load a string resource into a wstring. As already explained, ATL/MFC’s CString does offer very convenient methods for Win32 C++ programming. So, how can you load a string resource into a CString instance?

Well, it’s very simple! Just pass the string ID to a CString constructor overload with a proper type cast:

// Load string with ID IDS_MY_STRING from resources 
// into a CString instance
CString myString( (PCTSTR) IDS_MY_STRING );

You can even #define a simple preprocessor macro that takes the string resource ID and and creates a CString object storing the string resource:

#define _S(stringResId) (CString((PCTSTR) (stringResId)))

When you need to pass a (localized) string loaded from resources to a Win32 API or ATL/MFC class member function that expects a string pointer (typically in the form of LPCTSTR/PCTSTR, i.e. const TCHAR*), then you can simply use the convenient macro defined above:

// Show a message-box to the user, 
// displaying strings loaded from resources
MessageBox(nullptr,
           _S(IDS_SOME_MESSAGE_FOR_THE_USER),
           _S(IDS_SOME_TITLE),
           MB_OK);

How does that work?

Well, first the _S macro defined above creates a (temporary) CString object and loads the string resource into it. Then, the implicit LPCTSTR conversion operator provided by CString is invoked by the C++ compiler, so the (read-only) string pointer is passed to the proper parameter of the MessageBox API.

See how simple is that compared to using std::wstring, where we needed to create an ad hoc non-trivial function to load wstring from resources!

CString or std::string? That Is The Question (2023 Revisited)

Revisiting one of my first blog posts from 2010. Should you pick CString or std::string? Based on what context (ATL, MFC, cross-platform code)? And why?

…with (more) apologies to Shakespeare 🙂

I discussed that in 2010 at the beginning of my blog journey (on the now defunct MSMVPs blog web site – Thank You Very Much Internet Archive!).

It’s interesting to revisit that post today toward the end of 2023, more than 10 years later.

So, should we use CString or std::string class to store and manage strings in our C++ code?

Well, if there is a need of writing portable C++ code, the choice should be std::string, which is part of the C++ standard library.

(Me, on January 4th, 2010)

Still true today. Let’s also add that we can use std::string with Unicode UTF-8-encoded text to represent international text.

But, in the context of C++ Win32 programming (using ATL or MFC), I find CString class much more convenient than std::string.

These are some reasons:

Again, I think that is still true today. Now let’s see the reasons why:

1) CString allows loading strings from resources, which is good also for internationalization.

Still valid today. (You have to write additional code to do that with STL strings.)

2) CString offers a convenient FormatMessage method (which is good for internationalization, too; see for example the interesting problem of “Yoda speak” […])

Again, still true today. Although in C++20 (20+ years later than MFC!1) they added std::format. There’s also something from Boost (the Boost.Format library).

3) CString integrates well with Windows APIs (the implicit LPCTSTR operator comes in handy when passing instances of CString to Windows APIs, like e.g. SetWindowText).

Still valid today.

4) CString is reference counted, so moving instances of CString around is cheap.

Well, as discussed in a previous blog post, the Microsoft Visual C++ compiler and C++ Standard Library implementation have been improved a lot since VS 2008, and now the performance of STL’s strings is better than CString, at least for adding many strings to a vector and sorting the string vector.

5) CString offers convenient methods to e.g. tokenize strings, to trim them, etc.

This is still a valid reason today. 20+ years later, with C++20 they finally added some convenient methods to std::string, like starts_with and ends_with, but this is very little and honestly very late (but, yes, better late than ever).

So, CString is still a great string option for Windows C++ code that uses ATL and/or MFC. It’s also worth noting that you can still use CString at the ATL/MFC/Win32 boundary, and then convert to std::wstring or std::string for some more complex data structures (for example, something that would benefit from move semantics), or for better integration with STL algorithms or Boost libraries, or for cross-platform portions of C++ code.

  1. I used and loved Visual C++ 6 (which was released in 1998), and its MFC implementation already offered a great CString class with many convenient methods including those discussed here. So, the time difference between that and C++20 is more than 20 years! ↩︎

How to Convert Between ATL/MFC’s CString and std::wstring

This is an easy job, but with some gotchas.

In the previous series of articles on Unicode conversions, we saw how to perform various conversions, including ATL/STL mixed ones between Unicode UTF-16 CString and UTF-8 std::string.

Now, let’s assume that you have a Windows C++ code base using MFC or ATL, and the CString class. In Unicode builds (which have been the default in VS since Visual Studio 2005!), CString is a UTF-16 string class. You want to convert between that and the C++ Standard Library’s std::wstring.

How can you do that?

Well, in the Visual C++ implementation of the C++ standard library on Windows, std::wstring stores Unicode UTF-16-encoded text. (Note that, as already discussed in a previous blog post, this behavior is not portable to other platforms. But since we are discussing the case of an ATL or MFC code base here, we are already in the realm of Windows-specific C++ code.)

So, we have a match between CString and wstring here: they use the same Unicode encoding, as they both store Unicode UTF-16 text! Hooray! 🙂

So, the conversion between objects of these two classes is pretty simple. For example, you can use some C++ code like this:

//
// Conversion functions between ATL/MFC CString and std::wstring
// (Note: We assume Unicode build mode here!)
//

#if !defined(UNICODE)
#error This code requires Unicode build mode.
#endif

//
// Convert from std::wstring to ATL CString
//
inline CString ToCString(const std::wstring& ws)
{
    if (!ws.empty())
    {
        ATLASSERT(ws.length() <= INT_MAX);
        return CString(ws.c_str(), static_cast<int>(ws.length()));
    }
    else
    {
        return CString();
    }
}

//
// Convert from ATL CString to std::wstring
//
inline std::wstring ToWString(const CString& cs)
{
    if (!cs.IsEmpty())
    {
        return std::wstring(cs.GetString(), cs.GetLength());
    }
    else
    {
        return std::wstring();
    }
}

Note that, since std::wstring’s length is expressed as a size_t, while CString’s length is expressed using an int, the conversion from wstring to CString is not always possible, in particular for gigantic strings. For that reason, I used a debug-build ATLASSERT check on the input wstring length in the ToCString function. This aspect was discussed in more details in my previous blog post on unsafe conversions from size_t to int.

Converting Between Unicode UTF-16 CString and UTF-8 std::string

Let’s continue the Unicode conversion series, discussing an interesting case of “mixed” CString/std::string UTF-16/UTF-8 conversions.

In previous blog posts of this series we saw how to convert between Unicode UTF-16 and UTF-8 using ATL/MFC’s CStringW/A classes and C++ Standard Library’s std::wstring/std::string classes.

In this post I’ll discuss another interesting scenario: Consider the case that you have a C++ Windows-specific code base, for example using ATL or MFC. In this portion of the code the CString class is used. The code is built in Unicode mode, so CString stores Unicode UTF-16-encoded text (in this case, CString is actually a CStringW class).

On the other hand, you have another portion of C++ code that is standard cross-platform and uses only the standard std::string class, storing Unicode text encoded in UTF-8.

You need a bridge to connect these two “worlds”: the Windows-specific C++ code that uses UTF-16 CString, and the cross-platform C++ code that uses UTF-8 std::string.

Windows-specific C++ code, that uses UTF-16 CString, needs to interact with standard cross-platform C++ code, that uses UTF-8 std::string.
Windows-specific C++ code interacting with portable standard C++ code

Let’s see how to do that.

Basically, you have to do a kind of “code genetic-engineering” between the code that uses ATL classes and the code that uses STL classes.

For example, consider the conversion from UTF-16 CString to UTF-8 std::string.

The function declaration looks like this:

// Convert from UTF-16 CString to UTF-8 std::string
std::string ToUtf8(CString const& utf16)

Inside the function implementation, let’s start with the usual check for the special case of empty strings:

std::string ToUtf8(CString const& utf16)
{
    // Special case of empty input string
    if (utf16.IsEmpty())
    {
        // Empty input --> return empty output string
        return std::string{};
    }

Then you can invoke the WideCharToMultiByte API to figure out the size of the destination UTF-8 std::string:

// Safely fail if an invalid UTF-16 character sequence is encountered
constexpr DWORD kFlags = WC_ERR_INVALID_CHARS;

const int utf16Length = utf16.GetLength();

// Get the length, in chars, of the resulting UTF-8 string
const int utf8Length = ::WideCharToMultiByte(
    CP_UTF8,        // convert to UTF-8
    kFlags,         // conversion flags
    utf16,          // source UTF-16 string
    utf16Length,    // length of source UTF-16 string, in wchar_ts
    nullptr,        // unused - no conversion required in this step
    0,              // request size of destination buffer, in chars
    nullptr,        // unused
    nullptr         // unused
);
if (utf8Length == 0)
{
   // Conversion error: capture error code and throw
   ...
}

Then, as already discussed in previous articles in this series, once you know the size for the destination UTF-8 string, you can create a std::string object capable of storing a string of proper size, using a constructor overload that takes a size parameter (utf8Length) and a fill character (‘ ‘):

// Make room in the destination string for the converted bits
std::string utf8(utf8Length, ' ');

To get write access to the std::string object’s internal buffer, you can invoke the std::string::data method:

char* utf8Buffer = utf8.data();
ATLASSERT(utf8Buffer != nullptr);

Now you can invoke the WideCharToMultiByte API for the second time, to perform the actual conversion, using the destination string of proper size created above, and return the result utf8 string to the caller:

// Do the actual conversion from UTF-16 to UTF-8
int result = ::WideCharToMultiByte(
    CP_UTF8,        // convert to UTF-8
    kFlags,         // conversion flags
    utf16,          // source UTF-16 string
    utf16Length,    // length of source UTF-16 string, in wchar_ts
    utf8Buffer,     // pointer to destination buffer
    utf8Length,     // size of destination buffer, in chars
    nullptr,        // unused
    nullptr         // unused
);
if (result == 0)
{
    // Conversion error: capture error code and throw
    ...
}

return utf8;

I developed an easy-to-use C++ header-only library containing compilable code implementing these Unicode UTF-16/UTF-8 conversions using CString and std::string; you can find it in this GitHub repo of mine.

Converting Between Unicode UTF-16 and UTF-8 Using ATL CString and Direct Win32 API Calls

Let’s step up from the previous ATL CW2A/CA2W helpers, and write more efficient (and more customizable) C++ code that directly invokes Win32 APIs for doing Unicode UTF-16/UTF-8 conversions.

Last time we saw how to convert text between Unicode UTF-16 and UTF-8 using a couple of ATL helper classes (CW2A and CA2W). While this can be a good initial approach to “break the ice” with these Unicode conversions, we can do better.

For example, the aforementioned ATL helper classes create their own temporary memory buffer for the conversion work. Then, the result of the conversion must be copied from that temporary buffer into the destination CStringA/W’s internal buffer. On the other hand, if we work with direct Win32 API calls, we will be able to avoid the intermediate CW2A/CA2W’s internal buffer, and we could directly write the converted bytes into the CStringA/W’s internal buffer. That is more efficient.

In addition, directly invoking the Win32 APIs allows us to customize their behavior, for example specifying ad hoc flags that better suit our needs.

Moreover, in this way we will have more freedom on how to signal error conditions: Throwing exceptions? And what kind of exceptions? Throwing a custom-defined exception class? Use return codes? Use something like std::optional? Whatever, you can just pick your favorite error-handling method for the particular problem at hand.

So, let’s start designing our custom Unicode UTF-16/UTF-8 conversion functions. First, we have to pick a couple of classes to store UTF-16-encoded text and UTF-8-encoded text. That’s easy: In the context of ATL (and MFC), we can pick CStringW for UTF-16, and CStringA for UTF-8.

Now, let’s focus on the prototype of the conversion functions. We could pick something like this:

// Convert from UTF-16 to UTF-8
CStringA Utf8FromUtf16(CStringW const& utf16);

// Convert from UTF-8 to UTF-16
CStringW Utf16FromUtf8(CStringA const& utf8);

With this coding style, considering the first function, the “Utf16” part of the function name is located near the corresponding “utf16” parameter, and the “Utf8” part is near the returned UTF-8 string. In other words, in this way we put the return on the left, and the argument on the right:

CStringA resultUtf8 = Utf8FromUtf16(utf16Text);
//                            ^^^^^^^^^^^^^^^  <--- Argument: UTF-16

CStringA resultUtf8 = Utf8FromUtf16(utf16Text);
//       ^^^^^^^^^^^^^^^^^  <--- Return: UTF-8

Another approach is something more similar to the various std::to_string overloads implemented by the C++ Standard Library:

// Convert from UTF-16 to UTF-8
CStringA ToUtf8(CStringW const& utf16);

// Convert from UTF-8 to UTF-16
CStringW ToUtf16(CStringA const& utf8);

Let’s pick up this second style.

Now, let’s focus on the UTF-16-to-UTF-8 conversion, as the inverse conversion is pretty similar.

// Convert from UTF-16 to UTF-8
CStringA ToUtf8(CStringW const& utf16)
{
    // TODO ...
}

The first thing we can do inside the conversion function is to check the special case of an empty input string. In this case, we’ll just return an empty output string:

// Convert from UTF-16 to UTF-8
CStringA ToUtf8(CStringW const& utf16)
{
    // Special case of empty input string
    if (utf16.IsEmpty())
    {
        // Empty input --> return empty output string
        return CStringA();
    }

Now let’s focus on the general case of non-empty input string. First, we need to figure out the size of the result UTF-8 string. Then, we can allocate a buffer of proper size for the result CStringA object. And finally we can invoke a proper Win32 API for doing the conversion.

So, how can you get the size of the destination UTF-8 string? You can invoke the WideCharToMultiByte Win32 API, like this:

// Safely fail if an invalid UTF-16 character sequence is encountered
constexpr DWORD kFlags = WC_ERR_INVALID_CHARS;

const int utf16Length = utf16.GetLength();

// Get the length, in chars, of the resulting UTF-8 string
const int utf8Length = ::WideCharToMultiByte(
    CP_UTF8,            // convert to UTF-8
    kFlags,             // conversion flags
    utf16,              // source UTF-16 string
    utf16Length,        // length of source UTF-16 string, in wchar_ts
    nullptr,            // unused - no conversion required in this step
    0,                  // request size of destination buffer, in chars
    nullptr, nullptr    // unused
);

Note that the interface of that C Win32 API is non-trivial and error prone. Anyway, after reading its documentation and doing some tests, you can figure the parameters out.

If this API fails, it will return 0. So, here you can write some error handling code:

if (utf8Length == 0)
{
    // Conversion error: capture error code and throw
    AtlThrowLastWin32();
}

Here I used the AtlThrowLastWin32 function, which basically invokes the GetLastError Win32 API, converts the returned DWORD error code to HRESULT, and invokes AtlThrow with that HRESULT value. Of course, you are free to define your custom C++ exception class and throw it in case of errors, or use whatever error-reporting method you like.

Now that we know how many chars (i.e. bytes) are required to represent the result UTF-8-encoded string, we can create a CStringA object, and invoke its GetBuffer method to allocate an internal CString buffer of proper size:

// Make room in the destination string for the converted bits
CStringA utf8;
char* utf8Buffer = utf8.GetBuffer(utf8Length);
ATLASSERT(utf8Buffer != nullptr);

Now we can invoke the aforementioned WideCharToMultiByte API again, this time passing the address of the allocated destination buffer and its size. In this way, the API will do the conversion work, and will write the UTF-8-encoded string in the provided destination buffer:

// Do the actual conversion from UTF-16 to UTF-8
int result = ::WideCharToMultiByte(
    CP_UTF8,            // convert to UTF-8
    kFlags,             // conversion flags
    utf16,              // source UTF-16 string
    utf16Length,        // length of source UTF-16 string, in wchar_ts
    utf8Buffer,         // pointer to destination buffer
    utf8Length,         // size of destination buffer, in chars
    nullptr, nullptr    // unused
);
if (result == 0)
{
    // Conversion error
    // ...
}

Before returning the result CStringA object, we need to release the buffer allocated with CString::GetBuffer, invoking the matching ReleaseBuffer method:

// Don't forget to call ReleaseBuffer on the CString object!
utf8.ReleaseBuffer(utf8Length);

Now we can happily return the utf8 CStringA object, containing the converted UTF-8-encoded string:

    return utf8;

} // End of function ToUtf8

A similar approach can be followed for the inverse conversion from UTF-8 to UTF-16. This time, the Win23 API to invoke is MultiByteToWideChar.

Fortunately, you don’t have to write this kind of code from scratch. On my GitHub page, I have uploaded some easy-to-use C++ code that I wrote that implements these two Unicode UTF-16/UTF-8 conversion functions, using ATL CStringW/A and direct Win32 API calls. Enjoy!

Converting Between Unicode UTF-16 and UTF-8 Using ATL Helpers

Do you have some international text (e.g. Japanese) stored as Unicode UTF-16 CString in your C++ application, and want to convert it to UTF-8 for cross-platform/external export? A couple of simple-to-use ATL helpers can come in handy!

Someone had a CString object containing a Japanese string loaded from an MFC application resources, and they wanted to convert that Japanese string to Unicode UTF-8.

// CString loaded from application resources.
// The C++ application is built with Visual Studio in Unicode mode,
// so CString is equivalent to CStringW in this context.
// The CStringW object stores the string using 
// the Unicode UTF-16 encoding.
CString text;  // CStringW, text encoded in UTF-16
text.LoadString(IDS_SOME_JAPANESE_TEXT);

// How to convert that text to UTF-8??

First, the C++/MFC application was built in Unicode mode (which has been the default since VS 2005); so, CString is equivalent to CStringW in that context. The CStringW object stores the string as text encoded in Unicode UTF-16.

How can you convert that to Unicode UTF-8?

One option is to invoke Win32 APIs like WideCharToMultiByte; however, note that this requires writing non-trivial error-prone C++ code.

Another option is to use some conversion helpers from ATL. Note that these ATL string conversion helpers can be used in both MFC/C++ applications, and also in Win32/C++ applications that aren’t built using the MFC framework.

In particular, to solve the problem at hand, you can use the ATL CW2A conversion helper to convert the original UTF-16-encoded CStringW to a CStringA object that stores the same text encoded in UTF-8:

#include <atlconv.h> // for ATL conversion helpers like CW2A


// 'text' is a CStringW object, encoded using UTF-16.
// Convert it to UTF-8, and store it in a CStringA object.
// NOTE the *CP_UTF8* conversion flag specified to CW2A:
CStringA utf8Text = CW2A(text, CP_UTF8);

// Now the CStringA utf8Text object contains the equivalent 
// of the original UTF-16 string, but encoded in UTF-8.
//
// You can use utf8Text where a UTF-8 const char* pointer 
// is needed, even to build a std::string object that contains 
// the UTF-8-encoded string, for example:
// 
//   std::string utf8(utf8Text);
//

CW2A is basically a typedef to a particular CW2AEX template implemented by ATL, which contains C++ code that invokes the aforementioned WideCharToMultiByte Win32 API, in addition to properly manage the memory for the converted string.

But you can ignore the details, and simply use CW2A with the CP_UTF8 flag for the conversion from UTF-16 to UTF-8:

// Some UTF-16 encoded text
CStringW utf16Text = ...;

// Convert it from UTF-16 to UTF-8 using CW2A:
// ** Don't forget the CP_UTF8 flag **
CStringA utf8Text = CW2A(utf16Text, CP_UTF8);

In addition, there is a symmetric conversion helper that you can use to convert from UTF-8 to UTF-16: CA2W. You can use it like this:

// Some UTF-8 encoded text
CStringA utf8Text = ...;

// Convert it from UTF-8 to UTF-16 using CA2W:
// ** Don't forget the CP_UTF8 flag **
CStringW utf16Text = CA2W(utf8Text, CP_UTF8);

Let’s wrap up this post with these (hopefully useful) Unicode UTF-16/UTF-8 conversion tables:

ATL/MFC String ClassUnicode Encoding
CStringWUTF-16
CStringAUTF-8
ATL/MFC CString classes and their associated Unicode encoding

ATL Conversion HelperFromTo
CW2AUTF-16UTF-8
CA2WUTF-8UTF-16
ATL CW2A and CA2W string conversion helpers