Fastest way to clamp an integer to the range 0-255

  • I'm working on some image processing code that can generate pixel values outside of the normal range of 0 to 255, and I'd like to clamp them back into the valid range. I know that there are saturating SIMD instructions that make this a moot point, but I'm trying to stay within standard C++ code for the moment.

    The fastest I've been able to do on my Athlon II is the following:

    BYTE Clamp(int n)
    n &= -(n >= 0);
    return n | ((255 - n) >> 31);

    This compiles down into the following assembly with MSVC 6.0:

    setns dl
    neg edx
    and eax, edx
    mov edx, 255
    sub edx, eax
    sar edx, 31
    or dl, al

    Is there any improvement possible?

    Does it have to be signed?

    why AND'ing with 0xFF is not an option?

    @Pubby, yes the values can be less than 0 or greater than 255.

    @littleadv, the idea is that values outside the range take on the closest value within the range. I.e. you want -1 to become 0, and ANDing will not accomplish that.

    OK, got it:) I'm not familiar with the term "clamping" for that:)

    The funny thing is that when I loop the function 900000000 times over a set of 3 different inputs (-1, 256, 20), I only get a 300ms as opposed to using if statements to do the range checking/altering. It'd be interesting to see if you can get it faster though.

    Is it any faster to check if the high byte (0xff00) is non-zero then set to 255?

    It may be worth it to test if it's in range if that is the most likely case, taking less time for a valid value and a little more time for out of range numbers.

    @jswolf19, that's a good thought - I think over 99% of my data will be within range. Unfortunately with modern processors it's faster to treat everything the same than it is to take different paths for different conditions.

    I can think of way to average only 1.0 instructions/`int` on x86 and (0.5 instructions/`int` on x64) just to check if the value is in range. So you could run this over a block size (unrolled and everything) and for the rare chance that you do have an out-of-range value, you pay the misprediction penalty + fix up code...

    @Mystical, right now it's inline with the code that generates the values. I don't think I'd gain anything by saving the intermediate values and making a second pass.

    VC6? Really? Aren't newer compilers cleverer with optimisations and faster?

    @AshleysBrain, I also have VS 2010 Express and I should probably try it there too for completeness. I'm too dependent on MFC to make it my compiler of choice.

    This seems like a parallelizable task; I wonder if splitting a large image into n chunks could help some.

    @Leonid, it's very parallelizable but you'd still want each subtask to run as fast as it can.

    In a sad way, it's good to see that in 2011, somebody was (is?) still using MSVC6 for some things. I'm in the same, sad, situation. :-}

    You should use `sizeof(int)`, which is not necessarily 32.

    @200_success, actually it would be `(sizeof(int)*CHAR_BIT)-1`. You're correct of course, but I'm unlikely to use this code on any platform where an `int` isn't 32 bits.

    I independently arrived at an answer similar to what @Mysticial posted: `(n &= (~n >> 31)) | (0xFF - n) >> 31`, a one-liner which works great and is a bit sleeker. I was going to post a new answer because I had additional remarks but don't yet have enough reputation to answer a protected question. In short, my comments were about testing the input cases carefully, especially in-and-around the range `0x80000000`-`0x800000FF` (which should all result in `0`). Mixing bitwise with arithmetic operations can be tricky, causing bugs which aren't noticed because they only show up near `0x80000000`.

    C++17 has `std::clamp()`, which seems well optimized by GCC, but the output of Clang is suboptimal, see godbolt.

    @MarkRansom Done!

    @200_success, I'd be quite interested to learn of a platform where `sizeof (int)` *is* as large as 32...

  • Ira Baxter

    Ira Baxter Correct answer

    9 years ago


     int x=n>255?255:n;
    ... x<0?0:x ...

    I'd expect this to produce something like:

     mov     ebx, 255
    mov eax, n
    cmp eax, ebx
    cmovg eax, ebx ; conditional mov instruction
    test eax, eax
    mov ebx, 0
    cmovl eax, ebx

    If you are using MSVC SIX, you may not get the conditional move instruction. Try switching to a modern version of visual studio.

    I do have MSVC 2010 express, so I could try it there. Unfortunately I have two conditions to test for - less than 0, and greater than 255.

    If this is a pixel value, why are you working with signed values?

    @MarkRansom: the casting to unsigned takes care of negative values

    @6502, casting -1 to unsigned will generate a large positive value which will be converted to 255. I'd rather it convert to 0.

    @MarkRansom: you are 100% right. I'll blame for this being on fever in a saturday morning :-)

    @IraBaxter, the algorithm produces values outside the range of 0 to 255 so an integer is necessary to to capture the full range, but of course the output must be constrained after the fact. This isn't uncommon in image processing.

    @6502: Mark wanted negative numbers to produce zero.

    Ira, you must have changed this code since I left my first comments. I finally got around to trying VS 2010, and this becomes the fastest method. It still doesn't use the conditional move instruction, but it does make a very interesting optimization.

    I did, but StackOverflow migrated the question, and lost the edit times, so you couldn't see it. I see your revised answer with the short (predicted) branch; the CMOV I think is still the fastest way to do this because there is no branch ever. If MS won't generate it directly, you can always make a bit of assembler code. Thanks for noticing the changes, and for the brownie point.

    I'm glad that we're at the point where the most straightforward code is also the fastest in this situation.

    Guys, how about this `x = 255 ^ ((n ^ 255) & -(n < 255));`? It will give you minimum of both. See this. Would this be fastest?

    Why is this marked as answer? There is no cmovgt (it's cmovg, but sure) instruction and a conditional move with immediate value doesn't exist. What am I missing here?

    Well, the idea sketch was OK. Objections noted and answer revised.

License under CC-BY-SA with attribution

Content dated before 7/24/2021 11:53 AM