Converting Between Unicode UTF-16 and UTF-8 Using ATL Helpers

Do you have some international text (e.g. Japanese) stored as Unicode UTF-16 CString in your C++ application, and want to convert it to UTF-8 for cross-platform/external export? A couple of simple-to-use ATL helpers can come in handy!

Someone had a CString object containing a Japanese string loaded from an MFC application resources, and they wanted to convert that Japanese string to Unicode UTF-8.

// CString loaded from application resources.
// The C++ application is built with Visual Studio in Unicode mode,
// so CString is equivalent to CStringW in this context.
// The CStringW object stores the string using 
// the Unicode UTF-16 encoding.
CString text;  // CStringW, text encoded in UTF-16
text.LoadString(IDS_SOME_JAPANESE_TEXT);

// How to convert that text to UTF-8??

First, the C++/MFC application was built in Unicode mode (which has been the default since VS 2005); so, CString is equivalent to CStringW in that context. The CStringW object stores the string as text encoded in Unicode UTF-16.

How can you convert that to Unicode UTF-8?

One option is to invoke Win32 APIs like WideCharToMultiByte; however, note that this requires writing non-trivial error-prone C++ code.

Another option is to use some conversion helpers from ATL. Note that these ATL string conversion helpers can be used in both MFC/C++ applications, and also in Win32/C++ applications that aren’t built using the MFC framework.

In particular, to solve the problem at hand, you can use the ATL CW2A conversion helper to convert the original UTF-16-encoded CStringW to a CStringA object that stores the same text encoded in UTF-8:

#include <atlconv.h> // for ATL conversion helpers like CW2A


// 'text' is a CStringW object, encoded using UTF-16.
// Convert it to UTF-8, and store it in a CStringA object.
// NOTE the *CP_UTF8* conversion flag specified to CW2A:
CStringA utf8Text = CW2A(text, CP_UTF8);

// Now the CStringA utf8Text object contains the equivalent 
// of the original UTF-16 string, but encoded in UTF-8.
//
// You can use utf8Text where a UTF-8 const char* pointer 
// is needed, even to build a std::string object that contains 
// the UTF-8-encoded string, for example:
// 
//   std::string utf8(utf8Text);
//

CW2A is basically a typedef to a particular CW2AEX template implemented by ATL, which contains C++ code that invokes the aforementioned WideCharToMultiByte Win32 API, in addition to properly manage the memory for the converted string.

But you can ignore the details, and simply use CW2A with the CP_UTF8 flag for the conversion from UTF-16 to UTF-8:

// Some UTF-16 encoded text
CStringW utf16Text = ...;

// Convert it from UTF-16 to UTF-8 using CW2A:
// ** Don't forget the CP_UTF8 flag **
CStringA utf8Text = CW2A(utf16Text, CP_UTF8);

In addition, there is a symmetric conversion helper that you can use to convert from UTF-8 to UTF-16: CA2W. You can use it like this:

// Some UTF-8 encoded text
CStringA utf8Text = ...;

// Convert it from UTF-8 to UTF-16 using CA2W:
// ** Don't forget the CP_UTF8 flag **
CStringW utf16Text = CA2W(utf8Text, CP_UTF8);

Let’s wrap up this post with these (hopefully useful) Unicode UTF-16/UTF-8 conversion tables:

ATL/MFC String ClassUnicode Encoding
CStringWUTF-16
CStringAUTF-8
ATL/MFC CString classes and their associated Unicode encoding

ATL Conversion HelperFromTo
CW2AUTF-16UTF-8
CA2WUTF-8UTF-16
ATL CW2A and CA2W string conversion helpers

2 thoughts on “Converting Between Unicode UTF-16 and UTF-8 Using ATL Helpers”

Leave a comment