Skip to content

Giovanni Dicanio's Blog

Giovanni Dicanio's Programming Corner on the Internet

  • Home
  • Contact

Tag: EUC-JP

How to Convert from Japanese EUC (EUC-JP) to Unicode?

Win32 APIs like MultiByteToWideChar (or ATL helpers like CA2W) can come in handy, with the knowledge of the EUC-JP code page ID, and maybe an additional intermediate step via UTF-16.

Japanese EUC (Extended Unix Code), or EUC-JP, is a variable-length multi-byte encoding used to represent Japanese characters. For example, I found this encoding used in a Japanese/English dictionary file. How can you convert from it to Unicode?

Well, first “converting to Unicode” requires further refinement; for example: Do you want to convert to Unicode UTF-16, or UTF-8?

If you want to display the Japanese text encoded in EUC-JP in some Windows graphical application, you need to convert to Unicode UTF-16, as this is the “native” Unicode encoding used by Windows Win32 APIs.

So, to convert from EUC-JP to UTF-16 you can invoke the MultiByteToWideChar Win32 API (or use the CA2W ATL conversion helper), as discussed in several posts in the series on Unicode Conversions. The trick here is to identify the correct code page for EUC-JP.

The MSDN page on Code Page Identifiers reports code page EUC-JP as 20932.

I couldn’t find a preprocessor macro in the Windows Platform SDK defining the aforementioned code page ID (unlike, for example, CP_UTF8), but you can simply create a named constant for that purpose, for example:

// Japanese EUC or EUC-JP Code Page ID
constexpr UINT kCodePage_JapaneseEuc = 20932;

Then you can pass this named constant (instead of the “magic number” 20932) as the first parameter to MultiByteToWideChar, or as the second parameter to the proper ATL’s CA2W constructor overload that takes an input string and a code page ID for the conversion.

In this way, you can convert your input text encoded in EUC-JP to Unicode UTF-16, for passing it at the Win32 API boundary.

Now, what about converting from EUC-JP to UTF-8? Well, you cannot directly perform such conversion: You have to do an additional intermediate step, and go through UTF-16, instead. Basically, you can follow these steps:

  1. Convert from EUC-JP to UTF-16 via MultiByteToWideChar (or ATL CA2W) and the EUC-JP code page ID
  2. Convert from UTF-16 to UTF-8 via WideCharToMultiByte (or ATL CW2A) and the CP_UTF8 “code page” ID.

I already discussed this pattern in the blog post on converting between Japanese Shift JIS and Unicode UTF-8/UTF-16.

P.S. These days, if I have the freedom to pick an encoding for representing a text file, I would use Unicode UTF-8. But you may need to deal with legacy file formats, or very language-specific formats used in some particular contexts, so these kinds of conversions can be necessary.

Unknown's avatarAuthor giovannidicanioPosted on October 25, 2023October 25, 2023Categories Windows C++ ProgrammingTags ATL, C++, EUC-JP, Strings, Unicode, Unicode Conversions, UTF-16, UTF-8Leave a comment on How to Convert from Japanese EUC (EUC-JP) to Unicode?

LATEST POSTS

  • The IsoCpp.org Process for Suggesting Articles Is Broken and Should Be Fixed
  • Finding the Next Unicode Code Point in Strings: UTF-8 vs. UTF-16
  • Converting Between Unicode UTF-16 and UTF-8 in Windows C++ Code
  • Getting a Descriptive Error Message for a Windows System Error Code
  • Axialis Developer Suite 82%-off Discount

CATEGORIES

  • C++ Programming
  • Uncategorized
  • Windows C++ Programming

TAGS

Algorithms and Data Structures Assembly ATL Best practices BSTR Bugs C C# C++ ChatGPT Code Review Console CString DLL Integer overflow LPCWSTR LPWSTR MFC nodiscard OOP Optimization PCWSTR Performance Pluralsight Programming PWSTR Registry Resources Rust SafeInt SSO STL Strings string_view Unicode Unicode Conversions UNICODE_STRING unsigned int UTF-8 UTF-16 vector VSCode Windows Windows Kernel Mode WinReg

  • Home
  • Contact
Giovanni Dicanio's Blog Blog at WordPress.com.
  • Subscribe Subscribed
    • Giovanni Dicanio's Blog
    • Already have a WordPress.com account? Log in now.
    • Giovanni Dicanio's Blog
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar