Beware of Integer Overflows in Your C++ Code

Summing signed integer values on computers, with a *finite* number of bits available for representing integer numbers (16 bits, 32 bits, whatever) is not always possible, and can lead to subtle bugs. Let’s discuss that in the context of C++, and let’s see how to protect our code against those bugs.

Suppose that you are operating on signed integer values, for example: 16-bit signed integers. These may be digital audio samples representing the amplitude of a signal; but, anyway, their nature and origin is not of key importance here.

You want to operate on those 16-bit signed integers, for example: you need to sum them. So, you write a C++ function like this:

#include <cstdint>  // for int16_t
#include <vector>   // for std::vector

// Sum the 16-bit signed integers stored in the values vector
int16_t Sum(const std::vector<int16_t>& values)
{
    int16_t sum = 0;

    for (auto num : values)
    {
        sum += num;
    }

    return sum;
}

The input vector contains the 16-bit signed integer values to sum. This vector is passed using const reference (const &), as we are observing it inside the function, without modifying it.

Then we use the safe and convenient range-for loop to iterate through each number in the vector, and update the cumulative sum.

Finally, when the range-for loop is completed, the sum is returned back to the caller.

Pretty straightforward, right?

Now, try and create a test vector containing some 16-bit signed integer values, and invoke the above Sum() function on that, like this:

std::vector<int16_t> v{ 10, 1000, 2000, 32000 };
std::cout << Sum(v) << '\n';

I compiled and executed the test code using Microsoft Visual Studio 2019 C++ compiler in 64-bit mode, and the result I got was -30526: a negative number?!

Well, if you try to debug the Sum function, and execute the function’s code step by step, you’ll see that the initial partial sums are correct:

10 + 1000 = 1010

1010 + 2000 = 3010

Then, when you add the partial sum of 3010 with the last value of 32000 stored in the vector, the sum becomes a negative number.

Why is that?

Well, if you think of the 16-bit signed integer type, the maximum (positive) value that can be represented is 32767. You can get this value, for example, invoking std::numeric_limits<int16_t>::max():

cout << "Maximum value representable with int16_t: \n";
cout << std::numeric_limits<int16_t>::max() << '\n';

So, in the above sum example, when 3010 is summed with 32000, the sum exceeds the maximum value of 32767, and you hit an integer overflow.

In C++, signed integer overflow is undefined behavior. In this case of Microsoft Visual C++ 2019 compiler on Windows, we got a negative number as a sum of positive numbers, which, from a “high level” perspective is mathematically meaningless. (Actually, if you consider the binary representation of these numbers, the result kind of makes sense. But, going down to this low binary level is out of the scope of this post; moreover, in any case, from a “high-level” common mathematical perspective, summing positive integer numbers cannot lead to a negative result.)

So, how can we prevent such integer overflows to happen and cause buggy meaningless results?

Well, we could modify the above Sum function code, performing some safety checks before actually calculating the sum.

// Sum the 16-bit signed integers stored in the values vector
int16_t Sum(const std::vector<int16_t>& values)
{
    int16_t sum = 0;

    for (auto num : values)
    {
        //
        // TODO: Add safety checks here to prevent integer overflows 
        //
        sum += num;
    }

    return sum;
}

If you think about it, if you are adding two positive integer numbers, what is the condition such that their sum is representable with the same signed integer type (int16_t in this case)?

Well, the following condition must be satisfied:

a + b <= MAX

where MAX is the maximum value that can be represented in the given type: std::numeric_limits<int16_t>::max() or 32767 in our case.

In other words, the above condition expresses in mathematical terms that the sum of the two positive integer numbers a and b cannot exceed the maximum value MAX representable with the given signed integer type.

So, the overflow condition is the negation of the above condition, that is:

a + b > MAX

Of course, as we just saw above, you cannot perform the sum (a+b) on a computer if the sum value overflows! So, it seems like a snake biting its own tail, right? Well, we can fix that problem simply massaging the above condition, moving the ‘a’ quantity on the right-hand side, and changing its sign accordingly, like this:

b > MAX – a

So, the above is the overflow condition when a and b are positive integer numbers. Note that both sides of this condition can be safely evaluated, as (MAX – a) is always representable in the given type (int16_t in this example).

Now, you can do a similar reasoning for the case that both numbers are negative, and you want to protect the sum from becoming less than numeric_limits::min, which is -32768 for int16_t.

The overflow condition for summing two negative numbers is:

a + b < MIN

Which is equivalent to:

b < MIN – a

Now, let’s apply this knowledge to modify our Sum function to prevent integer overflow. We’ll basically check the overflow conditions above before doing the actual sum, and we’ll throw an exception in case of overflow, instead of producing a buggy sum value.

#include <cstdint>    // for int16_t
#include <limits>     // for std::numeric_limits
#include <stdexcept>  // for std::overflow_error
#include <vector>     // for std::vector

// Sum the 16-bit signed integers stored in the values vector.
// Throws a std::overflow_error exception on integer overflow.
int16_t Sum2(const std::vector<int16_t>& values)
{
    int16_t sum = 0;

    for (auto num : values)
    {
        //
        // Check for integer overflow *before* doing the sum
        //
        if (num > 0 && sum > std::numeric_limits<int16_t>::max() - num)
        {
            throw std::overflow_error("Overflow in Sum function when adding a positive number.");
        }
        else if (num < 0 && sum < std::numeric_limits<int16_t>::min() - num)
        {
            throw std::overflow_error("Overflow in Sum function when adding a negative number.");
        }

        // The sum is safe
        sum += num;
    }

    return sum;
}

Note that if you add two signed integer values that have different signs (so, you are basically making a subtraction of their absolute values), you can never overflow. So you might think of doing an additional check on the signs of the variables num and sum above, but I think that would be a useless complication of the above code, without any real performance benefits, so I would leave the code as is.

So, in this blog post we have discussed signed integer overflow. Next time, we’ll see the case of unsigned integers.

6 thoughts on “Beware of Integer Overflows in Your C++ Code”

  1. Sum2 has the surprising property that the result (or rather, whether or not it throws) depends on the order of the input array. For example, the input [2000, 32000, -31000] will cause it to throw while [32000, -31000, 2000] will cause it to return 3000. The original Sum returns 3000 in both cases, by doing one overflow and then one underflow (or overflow with negative numbers), which effectively cancel one another. You could make Sum3 which is like Sum2, but instead of throwing in the loop, it just counts the overflows and underflows, and throws at the end if the number of overflows is not equal to the number of underflows.

    Like

    1. Thanks for your comment. I have mixed feelings about that. I mean, the problem is that we are dealing with **undefined behavior** here with signed integer overflows. Of course, having a sum value that depends on the order of the addends is better than getting your hard disk or SSD reformatted 😉 but we are still in the realm of **UB**. To avoid UB in this case, I would probably use a larger integer type, like int32_t or int64_t (but of course this is not the point of this article; the point of this article is introducing the problem of integer overflows in C++ and associated bogus results, and showing some checks to detect signed integer overflows and signal that, for example throwing an exception instead of returning a bogus result).

      Like

      1. Good point about the undefined behavior.

        I agree that having a Sum dependent on the order is better than one that can cause undefined behavior. But a basic invariant of Sum (in principle) is that it does not depend on the order of inputs due to the associativity and commutativity of addition. It makes it hard for the user to reason about. Going to some trouble to maintain the expected properties seems worthwhile.

        For instance, my Sum3 could be modified to avoid undefined behavior by adding in min() before overflowing (or subtracting out min() before underflowing), and then the counts would be how many min()s are added; and clean it up or throw at the end.

        Or here’s an idea for Sum4: Iterate over the vector with two indices, initialized at zero. If the partial sum is negative, look for the next positive value with one index, and add that. If the partial sum is positive, look for the next negative value with the other index, and add that. If one index gets to the end, do the check and throw before adding a value with the same sign as the partial sum. If both indices get to the end, return the result.

        Like

Leave a comment