How to Declare a C++ Function that Takes a Blob of Memory?

Discussing several options, starting from the good old C-style void* pointer.

An interesting question you may ask in C++ is: “How would you declare a function that takes a blob of memory as input?”

For example, think of a function that hashes some input data (using SHA-256, or whatever hash algorithm), or a function that takes some binary data and writes that to disk.

Coming from my C background, an option that came to mind would certainly be:

void DoSomething(const void* p, size_t numBytes)

You simply pass a const void* pointer to the beginning of the input memory block, and the total size of the memory block, expressed in bytes.

Then, some C++ programmer could start complaining: “Hey, why do you use the unsafe old C-style void* pointer? Use some safe explicit type like uint8_t, which clearly represents an 8-bit byte!”.

So, they propose to “step up” to the following prototype:

void DoSomething(const uint8_t* p, size_t numBytes)

Now, suppose that you want to pass to this function a custom structure, like this:

struct MyCustomData {
    ...
};

MyCustomData data;

With the original void* version, you can invoke the function simply and clearly like this:

DoSomething(&data, sizeof(data));

The code is very clear and straightforward: you pass a pointer to the custom data structure, and its size in bytes. That’s it. Simple and clear.

On the other hand, with the “safe and modern” uint8_t prototype, the function call gets more complicated, as you need to add a type cast:

// void DoSomething(const uint8_t* p, size_t numBytes)
//
// DoSomething(&data, sizeof(data));
//
// This gives a compiler error when the function expects 
// a const uint8_t* instead of const void*, something like:
//
// Error: cannot convert 'MyCustomData*' to 'const uint8_t*'
//
// You need an explicit cast in this case!
DoSomething(
    reinterpret_cast<const uint8_t*>(&data), 
    sizeof(data)
);

Why should people complexify and uglify their C++ code with the uint8_t pointer (or std::byte), when void* works just fine??

Moreover, someone could even say: “Hey, in modern C++20, we have std::span! Use it!”

Well, congratulations for rising the complexity and noise of the code even further!

In fact, std::span is a class template, and somebody would suggest to make the function that processes the generic memory blob a function template! Really? Something like this??

template <typename T>
void DoSomething(std::span<T> data)

Or maybe something even more complicated, like this?

template <typename T, std::size_t N>
void DoSomething(std::span<T, N> data)

// Or this?
template <typename T, std::size_t N>
void DoSomething(std::span<const T, N> data)

Wow. With std::span the complexity-meter bumps in the red zone and goes even higher!

Someone may suggest something like a std::span<const uint8_t>? But that’s still more complex than the initial void* signature.

Do you want a pointer to a generic memory blob? C++ has already if from C: it’s called void*! Use it and enjoy.

I really dislike this attitude of some “modern” C++ programmers, that make choices that have the effect of making the code more complex, uglier and harder to write and understand.

It seems that some people are really losing the taste for good readable code.

Some good old habit from C can still be positively used in C++, like the void* pointer and the size parameters.


BTW: As a nice addition, if you use SAL annotations, the function could be decorated a bit to help code analyzers detecting memory bugs:

void DoSomething(
  _In_reads_bytes_(numBytes) const void * p,
  _In_ size_t numBytes
);

The _In_reads_bytes_ annotation applied to the pointer parameter explicitly states that the pointer points to input read-only memory (_In_reads_), and the size of this input buffer expressed in bytes (_bytes_) is represented by the numBytes parameter.

In this way, we still keep the clarity and simplicity of the function invocation:

DoSomething(&data, sizeof(data));

while also adding pieces of information that are helpful to spot memory bugs with code analyzers and other tools.

If you want to learn more about SAL annotations, you can start reading this MSDN documentation: Using SAL Annotations to Reduce C/C++ Code Defects.

3 thoughts on “How to Declare a C++ Function that Takes a Blob of Memory?”

  1. When I started my career professionally, I was lucky being hired by a boss
    who cared about forming a group of people capable, more or less, of working
    together. That implied a similar knowledge of coding architecture tools, a similar style etc.
    That was a hard task for sure to build over time, but at least there were attempts in that direction.
    I didn’t catch the importance of that at the beginning and thought, mistakenly, that
    was a common approach.
    There were many advantages in doing that, something I understood later. One of
    those was the ability to switch from one project to another for several reasons, a project followed by a colleague)
    without feeling desperate and reducing enormously the cognitive overloading; the ability to be up and running
    asap. and so on.
    It goes without saying that no one was tied to a chair. Every one could go away and follows his professional life.

    In your case, I wouldn’t use terms like “complexify and uglify” that belong more to a subjunctive sphere than an engineering
    world like coding, developing etc.

    Allow me to summarize your article.
    Your claim is, the first call to DoSomething() is *simply and clearly* (as everyone can see with his eyes, my note) because
    it is just that.
    The others, the one with utf8_t type and and with std::span, on the other hand are more verbose (the term is mine) and
    not immediately “simply and clearly”. That’s true. And as a fan of simplicity, as I am, I couldn’t agree more.

    However, there is something going on under the hood here and doesn’t appear at first glance and I feel it is important
    to point out.

    **
    When you call your first DoSomething() function, the one that takes a const void *, and pass , be it an array of chars, ints,
    or simply a structure, what happens under the hood and this is an heritage from C that C++ has, is decay of type.
    From a particular type to a generic one with a loose of information. Examples are a int to a long (in C++ you can narrow the opposite and be warned),
    an array of chars to a pointer to chars, a derived class to a its base one, and so on.

    The opposite direction is problematic and couldn’t be otherwise.
    How does one know in theory and in advance if a void * points to a structure instead of an array of double without additional info?
    Thant’s why the cast way of C++. They are verbose because they want to highlight that something unusual is going on, not necessarily wrong,
    but for sure that deserves more attention.

    One aside to your example. The second parameter to DoSomething() has an important information: the amount of…._bytes_ the first parameter
    points to!. So in theory there is no need to hide this information in a first instance. DoSomething() could have been
    declared simply as such
    DoSomething( const byte* pointerToBytes, size_t numOfBytes );
    and been simply and clear about its intention.

    Unless DoSomething() is some sort of Win32 WriteFile() API, I’m sure DoSomething() is going to do some elaboration on those data
    and how could it possibly know of what those data point to if the pointer points to unknown…? I’m sure that DoSomething() was going to C-cast
    that pointer anyway…

    **

    Luckily C++ is flexible enough, and I hope it remains so, that allows to use it to fit your needs and not to follow a cool moment.
    An aside of your argument. I’m not a fan of templates and I use them exactly when the advantages are much more than the price to pay to use
    them. Debugging a program under pressure or coding and finishing a project under deadlines are such examples where simplicity of code
    makes a difference in my opinion.
    Over my career, I’ve worked for several companies and as such on several legacy code built over decades and being made of thousand
    of thousand of LOC. And what I’ve always seen was a mess of different styles all melted together and patched in a chaotic way jut to let it run
    in a way or another.

    So, simple and clear code, _that is fixable in a minute by anyone_, is always preferable.

    Like

  2. This is just moving the frontier where you’ll need to cast. With a void* parameter, you’ll need to cast inside the function to u8 or whatever you expect. With a u8* parameter, the caller must cast but your function is now safe and clear from what it expect. You’re shifting the responsibility of the type checking to the caller, your code is safer. If the caller uses a struct Something instead of a u8, it’s their fault, not yours.

    It’s better for the function signature to express the type it expects to work on than for the caller to *guess* what it could be.

    If your code can expect multiple type that share a common “trait”, like being plain old data (u8, 16, u32, …) or integers or whatever, write a template version using a concept. It’s 1 more line to write, but it helps whoever **reads** your function’s signature to understand what it does instead of guessing and finding, at runtime, that it crashed. Also, it helps the compiler ensure at compile time that the constraint you’ve declared are valid, so you won’t be able to produce undefined behavior anymore.

    Like

    1. I prefer having the cast inside the function (if that’s the case), but let the user of the function write simpler and clearer code, without any unnecessary cast at the call site.

      Like

Leave a comment