Beware of Unsafe Conversions from size_t to int

Considering the Unicode conversion code between UTF-16 and UTF-8 using the C++ Standard Library strings and the WideCharToMultiByte and MultiByteToWideChar Win32 APIs, there’s an important aspect regarding the interoperability of the std::string and std::wstring classes at the interface of the aforementioned Win32 APIs.

For example, when you invoke the WideCharToMultiByte API to convert from UTF-16 to UTF-8, the fourth parameter (cchWideChar) represents the number of wchar_ts to process in the input string:

// The WideCharToMultiByte Win32 API declaration from MSDN:
// https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte

int WideCharToMultiByte(
  [in]            UINT                               CodePage,
  [in]            DWORD                              dwFlags,
  [in]            _In_NLS_string_(cchWideChar)LPCWCH lpWideCharStr,
  [in]            int                                cchWideChar,
  [out, optional] LPSTR                              lpMultiByteStr,
  [in]            int                                cbMultiByte,
  [in, optional]  LPCCH                              lpDefaultChar,
  [out, optional] LPBOOL                             lpUsedDefaultChar
);

As you can see from the function documentation, this cchWideChar “input string length” parameter is of type int.

On the other hand, the std::wstring::length/size methods return a value of type size_type, which is basically a size_t.

If you build your C++ code with Visual Studio in 64-bit mode, size_t, which is a typedef for unsigned long long, represents a 64-bit unsigned integer.

On the other hand, an int for the MS Visual C++ compiler is a 32-bit integer value, even in 64-bit builds.

So, when you pass the input string length from wstring::length/size to the WideCharToMultiByte API, you have a potential loss of data from 64-bit size_t (unsigned long long) to 32-bit int.

Moreover, even in 32-bit builds, when both size_t and int are 32-bit integers, you have signed/unsigned mismatch! In fact, in this case size_t is an unsigned 32-bit integer, while int is signed.

This is not a problem for strings of reasonable length. But, for example, if you happen to have a 3 GB string, in 32-bit builds a conversion from size_t to int will generate a negative number, and a negative length for a string doesn’t make sense. On the other hand, in 64-bit builds, if you a 5 GB string, converting from size_t to int will produce an int value of 1 GB, which is not the original string length.

The following table summarizes these kinds of bugs:

Build mode	size_t Type	int Type	Potential Bug when converting from size_t to int
64-bit	64-bit unsigned integer	32-bit signed integer	A “very big number” (e.g. 5GB) can be converted to an incorrect smaller number (e.g. 5GB -> 1GB)
32-bit	32-bit unsigned integer	32-bit signed integer	A “very big number” (e.g. 3GB) can be converted to a negative number (e.g. 3GB -> -1GB).

Potential bugs with size_t-to-int conversions

Note a few things:

size_t is an unsigned integer in both 32-bit and 64-bit target architectures (or build modes). However, its size does change.
int is always a 32-bit signed integer, in both 32-bit and 64-bit build modes.
This table applies to the Microsoft Visual C++ compiler (tested with VS 2019).

You can have some fun experimenting with these kinds of bugs with this simple C++ code:

// Testing "interesting" bugs with size_t-to-int conversions.
// Compiled with Microsoft Visual C++ in Visual Studio 2019
// by Giovanni Dicanio

#include <iostream>     // std::cout
#include <limits>       // std::numeric_limits

int main()
{
    using std::cout;

#ifdef _M_AMD64
    cout << " 64-bit build\n";
    const size_t s = 5UI64 * 1024 * 1024 * 1024; // 5 GB
#else
    cout << " 32-bit build\n";
    const size_t s = 3U * 1024 * 1024 * 1024; // 3 GB
#endif

    const int n = static_cast<int>(s);

    cout << " sizeof size_t: " << sizeof(s) << "; value = " << s << '\n';
    cout << " sizeof int:    " << sizeof(n) << "; value = " << n << '\n';
    cout << " max int:       " << (std::numeric_limits<int>::max)() << '\n';
}

Sample bogus conversion from size_t to int: a 5 giga size_t is silently converted to a 1 giga int. — Sample bogus conversion: a 5 giga size_t value is “silently” converted to a 1 giga int value

So, these conversions from size_t to int can be dangerous and bug-prone, in both 32-bit and 64-bit builds.

Note that, if you just try to pass a size_t value to a parameter expecting an int value, without static_cast<int>, the VC++ compiler will correctly emit warning messages. And these should trigger some “red lights” in your head and suggest that your C++ code needs some attention.

Writing Safer Conversion Code

To avoid the above problems and subtle bugs with size_t-to-int conversions, you can check that the input size_t value can be properly and safely converted to int. In such positive case, you can use C++ static_cast<int> to perform the conversion, and correctly suppress C++ compiler warning messages . Else, you can throw an exception to signal the impossibility of a meaningful conversion.

For example:

// utf16.length() is the length of the input UTF-16 std::wstring,
// stored as a size_t value.

// If the size_t length exceeds the maximum value that can be
// stored into an int, throw an exception
constexpr int kIntMax = (std::numeric_limits<int>::max)();
if (utf16.length() > static_cast<size_t>(kIntMax))
{
    throw std::overflow_error(
        "Input string is too long: size_t-length doesn't fit into int.");
}

// The value stored in the size_t can be *safely* converted to int:
// you can use static_cast<int>(utf16.length()) for that purpose.

Note that I used std::numeric_limits from the C++ <limits> header to get the maximum (positive) value that can be stored in an int. This value is returned by std::numeric_limits<int>::max().

Fixing an Ugly Situation of Naming Conflict with max

Unfortunately, since Windows headers already define max as a preprocessor macro, this can create a parsing problem with the max method name of std::numeric_limits from the C++ Standard Library. As a result of that, code invoking std::numeric_limits<int>::max() can fail to compile. To fix this problem, you can enclose the std::numeric_limits::max method call with an additional pair of parentheses, to prevent against the aforementioned macro expansion:

// This could fail to compile due to Windows headers 
// already defining "max" as a preprocessor macro:
//
// std::numeric_limits<int>::max()
//
// To fix this problem, enclose the numeric_limits::max method call 
// with an additional pair of parentheses: 
constexpr int kIntMax = (std::numeric_limits<int>::max)();
//                      ^                             ^
//                      |                             |
//                      *------- additional ( ) ------*

Note: Another option to avoid the parsing problem with “max” could be to #define NOMINMAX before including <Windows.h>, but that may cause additional problems with some Windows Platform SDK headers that do require these Windows-specific preprocessor macros (like <GdiPlus.h>). As an alternative, the INT_MAX constant from <limits.h> could be considered instead of the std::numeric_limits class template.

Widening Your Perspective of size_t-to-int Conversions and Wrapping Up

While I took the current series of blog posts on Unicode conversions as an occasion to discuss these kinds of subtle size_t-to-int bugs, it’s important to note that this topic is much more general. In fact, converting from a size_t value to an int can happen many times when writing C++ code that, for example, uses C++ Standard Library classes and functions that represent lengths or counts of something (e.g. std::[w]string::length, std::vector::size) with size_type/size_t, and interacts with Win32 APIs that use int instead (like the aforementioned WideCharToMultiByte and MultiByteToWideChar APIs). Even ATL/MFC’s CString uses int (not size_t) to represent a string length. And similar problems can happen with third party libraries as well.

A reusable convenient C++ helper function can be written to safely convert from size_t to int, throwing an exception in case of impossible meaningful conversion. For example:

// Safely convert from size_t to int.
// Throws a std::overflow_error exception if the conversion is impossible.
inline int SafeSizeToInt(size_t sizeValue)
{
    constexpr int kIntMax = (std::numeric_limits<int>::max)();
    if (sizeValue > static_cast<size_t>(kIntMax))
    {
        throw std::overflow_error("size_t value is too big to fit into an int.");
    }

    return static_cast<int>(sizeValue);
}

Wrapping up, it’s also worth noting and repeating that, in case of strings of reasonable length (not certainly a 3 GB or 5 GB string), converting a length value from size_t to an int with a simple static_cast<int> doesn’t cause any problems. But if you want to write more robust C++ code that is prepared to handle even gigantic strings (maybe maliciously crafted on purpose?), an additional check and potentially throwing an exception is a good safer option.

Beware of Unsafe Conversions from size_t to int

Writing Safer Conversion Code

Fixing an Ugly Situation of Naming Conflict with max

Widening Your Perspective of size_t-to-int Conversions and Wrapping Up

3 thoughts on “Beware of Unsafe Conversions from size_t to int”

Leave a comment Cancel reply

Writing Safer Conversion Code

Fixing an Ugly Situation of Naming Conflict with max

Widening Your Perspective of size_t-to-int Conversions and Wrapping Up

Share this:

Related

3 thoughts on “Beware of Unsafe Conversions from size_t to int”

Leave a comment Cancel reply