How to Print Unicode Text to the Windows Console in C++

How can you print Unicode text to the Windows console in your C++ programs? Let’s discuss both the UTF-16 and UTF-8 encoding cases.

Suppose that you want to print out some Unicode text to the Windows console. From a simple C++ console application created in Visual Studio, you may try this line of code inside main:

std::wcout << L"Japan written in Japanese: \x65e5\x672c (Nihon)\n";

The idea is to print the following text:

Japan written in Japanese: 日本 (Nihon)

The Unicode UTF-16 encoding of the first Japanese kanji is 0x65E5; the second kanji is encoded in UTF-16 as 0x672C. These are embedded in the C++ string literal sent to std::wcout using the escape sequences \x65e5 and \x672c respectively.

If you try to execute the above code, you get the following output:

The Japanese kanjis are not printed out in the Windows console in this case.
Wrong output: the Japanese kanjis are missing!

As you can see, the Japanese kanjis are not printed. Moreover, even the “standard ASCII” characters following those (i.e.: “(Nihon)”) are missing. There’s clearly a bug in the above code.

How can you fix that?

Well, the missing piece is setting the proper translation mode for stdout to Unicode UTF-16, using _setmode and the _O_U16TEXT mode parameter.

// Change stdout to Unicode UTF-16
_setmode(_fileno(stdout), _O_U16TEXT);

Now the output is what you expect:

The correct output, including the Japanese kanjis.
The correct output of Unicode UTF-16 text.

The complete compilable C++ code follows:

// Printing Unicode UTF-16 text to the Windows console

#include <fcntl.h>      // for _setmode
#include <io.h>         // for _setmode
#include <stdio.h>      // for _fileno

#include <iostream>     // for std::wcout

int main()
{
    // Change stdout to Unicode UTF-16
    _setmode(_fileno(stdout), _O_U16TEXT);

    // Print some Unicode text encoded in UTF-16
    std::wcout << L"Japan written in Japanese: \x65e5\x672c (Nihon)\n";
}

(The above code was compiled with VS 2019 and executed in the Windows 10 command prompt.)

Note that the font you use in the Windows console must support the characters you want to print; in this example, I used the MS Gothic font to show the Japanese kanjis.

The Unicode UTF-8 Case

What about printing text using Unicode UTF-8 instead of UTF-16 (especially with all the suggestions about using “UTF-8 everywhere“)?

Well, you may try to invoke _setmode and this time pass the UTF-8 mode flag _O_U8TEXT (instead of the previous _O_U16TEXT), like this:

// Change stdout to Unicode UTF-8
_setmode(_fileno(stdout), _O_U8TEXT);

And then send the UTF-8 encoded text via std::cout:

// Print some Unicode text encoded in UTF-8
std::cout << "Japan written in Japanese: \xE6\x97\xA5\xE6\x9C\xAC (Nihon)\n";

If you build and run that code, you get… an assertion failure!

Visual C++ debug assertion failure when trying to print out Unicode UTF-8 encoded text.
Visual C++ assertion failure when trying to print Unicode UTF-8-encoded text.

So, it seems that this (logical) scenario is not supported, at least with VS2019 and Windows 10.

How can you solve this problem? Well, an option is to take the Unicode UTF-8 encoded text, convert it to UTF-16 (for example using this code), and then use the method discussed above to print out the UTF-16 encoded text.

EDIT 2023-11-28: Compilable C++ demo code uploaded to GitHub.

Screenshot showing that both the Unicode UTF-16 and UTF-8 text are correctly printed in the Windows console.
Unicode UTF-16 and UTF-8 correctly printed out in the Windows console.

6 thoughts on “How to Print Unicode Text to the Windows Console in C++”

  1. We use setlocale(LC_ALL, “.UTF8”) at the beginning of main() in all of our programs. This even means that filenames are now UTF8 (using regular C or C++ functions to access files). However, we had a few cases where command line arguments containing filenames didn’t work. For this we use CommandLineToArgvW(GetCommandLineW(), &argc) and then convert them from UTF16 to UTF8.

    Like

    1. Simon: I tried adding setlocale(LC_ALL, “.UTF8”) at the beginning of main(), then:

      std::cout << "Japan written in Japanese: \xE6\x97\xA5\xE6\x9C\xAC (Nihon)\n";

      and I got a couple of ?? instead of the proper Japanese kanjis.
      This is on Windows 10 64-bit and using Visual Studio 2019.
      So, this does NOT work for me.

      Which C++ compiler and OS are you using? Please provide a *reproducible* sample.
      (Moreover, even assuming that that works with some newer version of Visual C++/CRT/Windows, you are confirming that the support is still incomplete as per the CommandLineToArgvW function invocation you mentioned).

      Like

      1. Linux and macOS are already UTF8 by default. So, this applies to Windows only.

        We are also still using Visual Studio 2019. In general this works on Windows 10 and 11. I have never tested Japanese, but only German (and I think we once had someone use Chinese).

        One other thing we do is to add the compiler switch to use Unicode Character Set (Configuration Properties -> Advanced -> Character Set). It is important that both the source and execution character set are UTF8 (explicit switch /utf-8). Especially with the source character set being UTF8 I would not escape the Japanese characters, but write them directly. For this it is important that the source is actually saved as UTF8 (I am not sure where exactly this needs to be configured in VS).

        Like

      2. Simon: This blog post is about *Windows* (as can be read from the title), not macOS or Linux. I showed you a repro with code that doesn’t work (including yours), and how to make things work. Unfortunately, as of today, the UTF-8 support on Windows is kind of more limited. Your suggestion doesn’t work for perfectly UTF-8-encoded Japanese text. We need to convert to UTF-16 for that, which has been the default Unicode encoding for Windows for very long time. Things could change in the future, though (but there still would be a lot of Windows C++ code that is Unicode UTF-16-based that would need attention and maintenance).

        Like

      3. I tried a few more things. My development machine is Windows 11 and the code page is set to UTF8 (there is an option for this on Windows 11). In this case my suggested approach works. I tried it on another computer and could reproduce your results: It didn’t work (code page was set for Germany) and I also saw two question marks. Are you using a Japanese code page for your tests?

        I really thought I found the easiest solution to this problem. We are using our software on Linux and macOS as well. Having output in both UTF8 and UTF16 depending on the platform would be a nightmare. At least with German the German-specific UTF8 characters seem to be printable with a German code page.

        Let’s hope for the future that Windows will also switch to the UTF8 code page by default.

        Like

      4. Simon: as your tests confirm, your approach didn’t always work: it only worked in *some cases*, for example in Windows 11, but not on Windows 10, and only for German, not for Japanese, etc. Many are still using Windows 10 (and Windows 7 on several systems, too).
        And in a strongly international-aware application, you need international text to work on many languages, not only for German or European ones.
        I don’t exclude that in the future Windows support for UTF-8 would be improved, but it’s not the case today, at least up until Windows 10, which is still widely used. Moreover, you also have to consider the interactions between Windows support, UCRT, VC++’s CRT, etc. Everything should be coherently moved to fully support UTF-8.

        Moreover, even if in the future they would assume a default UTF-8 code page with “ANSI” versions of the Win32 APIs, that could cause a conflict with existing code bases that assume that the “A” versions of the APIs are set to other specific code pages.

        As of today, for cross platform C++ code, I would suggest using UTF-8, but then at the Windows border convert to UTF-16, which has a very mature and stable support on Windows. Basically, in all these years, Unicode UTF-16 has been “the default” Unicode mode for Windows. For example, Visual Studio C++ projects default to Unicode UTF-16 since Visual Studio 2005 (it’s been almost 20 years), and I recall I manually switched to Unicode UTF-16 with Visual C++ 6 before (which defaulted to ANSI/MBCS).

        Like

Leave a comment