Converting Between Unicode UTF-16 and UTF-8 Using C++ Standard Library’s Strings and Direct Win32 API Calls

std::string storing UTF-8-encoded text is a good option for C++ cross-platform code. Let’s discuss how to convert between that and UTF-16-encoded wstrings, using direct Win32 API calls.

Last time we saw how to convert between Unicode UTF-16 and UTF-8 using ATL strings and direct Win32 API calls. Now let’s focus on doing the same Unicode UTF-16/UTF-8 conversions but this time using C++ Standard Library’s strings. In particular, you can use std::wstring to represent UTF-16-encoded strings, and std::string for UTF-8-encoded ones.

Using the same coding style of the previous blog post, the conversion function prototypes can look like this:

// Convert from UTF-16 to UTF-8
std::string ToUtf8(std::wstring const& utf16);
 
// Convert from UTF-8 to UTF-16
std::wstring ToUtf16(std::string const& utf8);

As an alternative, you may consider the C++ Standard Library snake_case style, and the various std::to_string and std::to_wstring overloaded functions, and use something like this:

// Convert from UTF-16 to UTF-8
std::string to_uf8_string(std::wstring const& utf16);
 
// Convert from UTF-8 to UTF-16
std::wstring to_utf16_wstring(std::string const& utf8);

Anyway, let’s keep the former coding style already used in the previous blog post.

The conversion code is very similar to what you already saw for the ATL CString case.

In particular, considering the UTF-16-to-UTF-8 conversion, you can start with the special case of an empty input string:

std::string ToUtf8(std::wstring const& utf16)
{
    // Special case of empty input string
    if (utf16.empty())
    {
        // Empty input --> return empty output string
        return std::string{};
    }

Then you can invoke the WideCharToMultiByte API to figure out the size of the destination UTF-8 string:

// Get the length, in chars, of the resulting UTF-8 string
const int utf8Length = ::WideCharToMultiByte(
    CP_UTF8,            // convert to UTF-8
    kFlags,             // conversion flags
    utf16.data(),       // source UTF-16 string
    utf16Length,        // length of source UTF-16 string, in wchar_ts
    nullptr,            // unused - no conversion required in this step
    0,                  // request size of destination buffer, in chars
    nullptr, nullptr    // unused
);
if (utf8Length == 0)
{
   // Conversion error: capture error code and throw
   ...
}

Note that, while in case of CString, you could simply pass CString instances to WideCharToMultiByte parameters expecting a const wchar_t* (thanks to the implicit conversion from CStringW to const wchar_t*), with std::wstring you have explicitly invoke a method to get that read-only wchar_t pointer. I invoked the wstring::data method; another option is to call the wstring::c_str method.

Moreover, you can define a custom C++ exception class to represent a conversion error, and throw instances of this exception on failure. For example, you could derive that exception from std::runtime_error, and add a DWORD data member to represent the error code returned by the GetLastError Win32 API.

Once you know the size for the destination UTF-8 string, you can create a std::string object capable of storing a string of proper size, using a constructor overload that takes a size parameter (utf8Length) and a fill character (‘ ‘):

// Make room in the destination string for the converted bits
std::string utf8(utf8Length, ' ');

To get write access to the std::string object’s internal buffer, you can invoke the std::string::data method:

char* utf8Buffer = utf8.data();

Now you can invoke the WideCharToMultiByte API for the second time, to perform the actual conversion, using a destination string of proper size created above:

// Do the actual conversion from UTF-16 to UTF-8
int result = ::WideCharToMultiByte(
    CP_UTF8,            // convert to UTF-8
    kFlags,             // conversion flags
    utf16.data(),       // source UTF-16 string
    utf16Length,        // length of source UTF-16 string, in wchar_ts
    utf8Buffer,         // pointer to destination buffer
    utf8Length,         // size of destination buffer, in chars
    nullptr, nullptr    // unused
);
if (result == 0)
{
    // Conversion error: capture error code and throw
    ...
}

Finally, you can simply return the result UTF-8 string back to the caller:

    return utf8;

} // End of function ToUtf8 

Note that with C++ Standard Library strings you don’t need the GetBuffer/ReleaseBuffer “dance” required by ATL CStrings.

I developed an easy-to-use C++ header-only library containing compilable code implementing these Unicode UTF-16/UTF-8 conversions using std::wstring/std::string; you can find it in this GitHub repo of mine.