How Do I Convert Between Japanese Shift JIS and Unicode UTF-8/UTF-16?

When you need a “MultiByteToMultiByte” conversion API, and there is none, you can use a two-step conversion process with UTF-16 coming to the rescue.

Shift JIS is a text encoding for the Japanese language. While these days Unicode is much more widely used, you may still find Japanese text encoded using Shift JIS. So, you may find yourself in a situation where you need to convert text between Shift JIS (SJIS) and Unicode UTF-8 or UTF-16.

If you need to convert from SJIS to UTF-16, you can invoke the MultiByteToWideChar Win32 API, passing the Shift JIS code page identifier, which is 932. Similarly, for the opposite conversion from UTF-16 to SJIS you can invoke the WideCharToMultiByte API, passing the same SJIS code page ID.

You can simply reuse and adapt the C++ code discussed in the previous blog posts on Unicode UTF-16/UTF-8 conversions (using STL strings or ATL CString), which called the aforementioned WideCharToMultiByte and MultiByteToWideChar APIs.

Things become slightly more complicated (and interesting) if you need to convert between Shift JIS and Unicode UTF-8. In fact, in that case there is no “MultiByteToMultiByte” Win32 API available. But, fear not! 🙂 In fact, you can simply perform the conversion in two steps.

For example, to convert from Shift JIS to Unicode UTF-8, you can:

  1. Invoke MultiByteToWideChar to convert from Shift JIS to UTF-16
  2. Invoke WideCharToMultiByte to convert from UTF-16 (returned in the previous step) to UTF-8

In other words, you can use the UTF-16 encoding as a “temporary” helper result in this two-phase conversion process.

Similarly, if you want to convert from Unicode UTF-8 to Shift JIS, you can:

  1. Invoke MultiByteToWideChar to convert from UTF-8 to UTF-16
  2. Invoke WideCharToMultiByte to convert from UTF-16 (returned in the previous step) to Shift JIS.

One thought on “How Do I Convert Between Japanese Shift JIS and Unicode UTF-8/UTF-16?”

Leave a comment