The Case of string_view and the Magic String

An interesting bug involving the use of string_view instead of std::string const&.

Suppose that in your C++ code base you have a legacy C-interface function:

// Takes a C-style NUL-terminated string pointer as input
void DoSomethingLegacy(const char* s)
{
    // Do something ...

    printf("DoSomethingLegacy: %s\n", s);
}

The above function is called from a C++ function/method, for example:

void DoSomethingCpp(std::string const& s)
{
    // Invoke the legacy C function
    DoSomethingLegacy(s.data());
}

The calling code looks like this:

std::string s = "Connie is learning C++";

// Extract the "Connie" substring
std::string s1{ s.c_str(), 6 };

DoSomethingCpp(s1);

The string that is printed out is “Connie”, as expected.

Then, someone who knew about the new std::string_view feature introduced in C++17, modifies the above code to “modernize” it, replacing the use of std::string with std::string_view:

// Pass std::string_view instead of std::string const&
void DoSomethingCpp(std::string_view sv)
{
    DoSomethingLegacy(sv.data());
}

The calling code is modified as well:

std::string s = "Connie is learning C++";


// Use string_view instead of string:
//
//     std::string s1{ s.c_str(), 6 };
//
std::string_view sv{ s.c_str(), 6 };

DoSomethingCpp(sv);

The code is recompiled and executed. But, unfortunately, now the output has changed! Instead of the expected “Connie” substring, now the entire string is printed out:

Connie is learning C++

What’s going on here? Where does that “magic string” come from?

Analysis of the Bug

Well, the key to figure out this bug is understanding that std::string_view’s are not necessarily NUL-terminated. On the other hand, the legacy C-interface function does expect as input a C-style NUL-terminated string (passed via const char*).

In the initial code, a std::string object was created to store the “Connie” substring:

// Extract the "Connie" substring
std::string s1{ s.c_str(), 6 };

This string object was then passed via const& to the DoSomethingCpp function, which in turn invoked the string::data method, and passed the returned C-style string pointer to the DoSomethingLegacy C-interface function.

Since strings managed via std::string objects are guaranteed to be NUL-terminated, the string::data method pointed to a NUL-terminated contiguous sequence of characters, which was what the DoSomethingLegacy function expected. Everyone’s happy.

On the other hand, when std::string is replaced with std::string_view in the calling code:

// Use string_view instead of string:
//
//     std::string s1{ s.c_str(), 6 };
//
std::string_view sv{ s.c_str(), 6 };

DoSomethingCpp(sv);

you lose the guarantee that the sub-string is NUL-terminated!

In fact, this time when sv.data is invoked inside DoSomethingCpp, the returned pointer points to a sequence of contiguous characters that is the original string s, which is the whole string “Connie is learning C++”. There is no NUL-terminator after “Connie” in that string, so the legacy C function that takes the string pointer just goes on and prints the whole string, not just the “Connie” substring, until it finds a NUL-terminator, which follows the last character of “Connie is learning C++”.

Figuring out the std::string_view related bug: The string_view points to a sub-string that does *not* include a NUL-terminator.
Figuring out the bug involving the (mis)use of string_view

So, be careful when replacing std::string const& parameters with string_views! Don’t forget that string_views are not guaranteed to be NUL-terminated! That is very important when writing or maintaining C++ code that interoperates with legacy C or C-style code.

2 thoughts on “The Case of string_view and the Magic String”

  1. If you want a C string pointer from a std::string, it’s better to call c_str, not data. Not only does the name directly reflect the intention, but since string_view has no c_str method you’ll get a compile error if you do something like the above example, replacing strings with string_views.

    Like

    1. This seems a good suggestion, although std::string::data is guaranteed to return a null-terminated string.
      From CppReference:
      “The returned array is null-terminated, that is, data() and c_str() perform the same function.”
      https://en.cppreference.com/w/cpp/string/basic_string/data
      In any case, the problem still exists (and causes nasty bugs) when people pass around string_views and somewhere in the call stack they assume that the string_views are null-terminated, like when they passed std::string const&.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: