Unicode Conversions with String Views as Input Parameters

Replacing input STL string parameters with string views: Is it always possible?

In a previous blog post, I showed how to convert between Unicode UTF-8 and UTF-16 using STL string classes like std::string and std::wstring. The std::string class can be used to store UTF-8-encoded text, and the std::wstring class can be used for UTF-16. The C++ Unicode conversion code is available on GitHub as open source project.

The above code passes input string parameters using const references (const &) to STL string objects:

// Convert from UTF-16 to UTF-8
std::string ToUtf8(std::wstring const& utf16)
    
// Convert from UTF-8 to UTF-16
std::wstring ToUtf16(std::string const& utf8)

Since C++17, it’s also possible to use string views for input string parameters. Since string views are cheap to copy, they can just be passed by value (instead of const&). For example:

// Convert from UTF-16 to UTF-8
std::string ToUtf8(std::wstring_view utf16)
    
// Convert from UTF-8 to UTF-16
std::wstring ToUtf16(std::string_view utf8)

As you can see, I replaced the input std::wstring const& parameter above with a simpler std::wstring_view passed by value. Similarly, std::string const& was replaced with std::string_view.

Important Gotcha on String Views and Null Termination

There is an important note to make here. The WideCharToMultiByte and MultiByteToWideChar Windows C-interface APIs that are used in the conversion code can accept input strings in two forms:

  1. A null-terminated C-style string pointer
  2. A counted (in bytes or wchar_ts) string pointer

In my code, I used the second option, i.e. the counted behavior of those APIs. So, using string views instead of STL string classes works just fine in this case, as string views can be seen as a pointer and a “size”, or count of characters.

A representation of string views: they can be seen as a pointer and a size.
A representation of string views: pointer + size

But string views are not necessarily null-terminated, which implies that you cannot safely use string view parameters when passing strings to APIs that expect null-terminated C-style strings. In fact, if the API is expecting a terminating null, it may well run over the valid string view characters. This is a very important point to keep in mind, to avoid subtle and dangerous bugs when using input string view parameters.

The modified code that uses input string view parameters instead of STL string classes passed by const& can be found in this branch of the main Unicode string conversion project on GitHub.

Leave a comment