Counting Bits

01 Oct 2013

I really enjoy programming, especially when there is a concrete specification of both the input and the output of the program. From my experience receiving a good specification with description of the input and output is rather rare in the software engineering field. That is why I like grabbing an rfc from time to time and implement it.

Lately I have been looking at UTF-8 decoding and encoding which is specified in rfc-3629. The great thing with rfc's is that they contain a complete specification and they are easy to read and understand.

When decoding a UTF-8 byte stream you need an über function for counting the number of leading 1 bits in a byte. So we need a function the given the input bits 1000 0000 returns 1 and given the input bits 1100 0000 returns 2 and so on. You would think that with such a small problem, there are a small set of algorithms that solves this. However the reality is quite different. I will show the different solutions that I tried out when programming this. Note that I have limited the solutions to only count the first 4 bits.

Solution 0: Mask and compare

This is what I consider the straight forward solution. In this solution we first filter out the bits to the left and check if they match a pattern. This is not the quickest solution due to all the masking of bits.

int mask(uint8_t value)
{
  if ((value & 0x80) == 0x0)  // 0xxx xxxx
    return 0;
  if ((value & 0xc0) == 0x80) // 10xx xxxx
    return 1;
  if ((value & 0xe0) == 0xc0) // 110x xxxx
    return 2;
  if ((value & 0xf0) == 0xe0) // 1110 xxxx
    return 3;
  return 4;
}

Solution 1: Compare single bit

This is very similar to solution 0, the only difference is that instead of checking for a specific bit pattern on the left side of the byte we are only checking a single bit. This works because checking the number of leading 1 bits is equivalent to checking where the first 0 bit is. The complexity of this solution is equal to solution 0 since it is doing the same number of masks and compares.

int single(uint8_t value)
{
  if ((value & 0x80) == 0x0) // 1000 0000
    return 0;
  if ((value & 0x40) == 0x0) // 0100 0000
    return 1;
  if ((value & 0x20) == 0x0) // 0010 0000
    return 2;
  if ((value & 0x10) == 0x0) // 0001 0000
    return 3;
  return 4;
}

Solution 2: Greater than

Another way of checking the number of leading 1 bits is to change the view of the problem. We know that if a byte is larger than for instance 0xf0 then it must have at least 4 leading 1 bits. Using a chain of greater than comparisons we can figure out how many leading 1 bits we have.

int greater(uint8_t value)
{
  if (value > 0xf0)
    return 4;
  if (value > 0xe0)
    return 3;
  if (value > 0xc0)
    return 2;
  if (value > 0x80)
    return 1;
  return 0;
}

Solution 3: Less than

Using greater than comparison is not that optimal because we check for the uncommon case first, and most of the time a UTF-8 byte stream has only a single byte per unicode character. So we invert the check and use a chained list of less than comparisons instead.

int less(uint8_t value)
{
  if (value < 0x80)
    return 0;
  if (value < 0xc0)
    return 1;
  if (value < 0xe0)
    return 2;
  if (value < 0xf0)
    return 3;
  return 4;
}

Solution 4: Compiler builtin function

Given the fact that I am the only one working on this code and I know what compiler i will be using I can start using builtin functions that the compiler provides. In my case I am using gcc and it provides a builtin function for counting the number of leading zeros. So I will use that function and tweak the input and output so I make it count the number of leading 1 bits.

int builtin(uint8_t value)
{
  uint8_t invers = ~value;
  return __builtin_clz(invers) - 24;
}

Solution 5: Assembler instruction

The next step I tried looks a lot like the previous solution, but this time I'm using inline assembler code. The x86 architecture has a lot of nice instructions and one of them is bsr (Bit Scan Reverse). This instruction will search for the first 1 bit and return the position of the bit. So given the input 0x0f the bsr instruction will return 3 since the leftmost 1 bit in 0x0f is on bit index 3. We can use this instruction to calculate the number of leading 1 bits.

int assembler(uint8_t value)
{
  int msb; // Position of first 1 bit
  uint8_t invers = ~value;
  asm("bsrl %1,%0" : "=r"(msb) : "r"((unsigned)invers));
  return 7 - msb;
}

Solution 6: Small table lookup

All the previous solutions they calculate the result in some way. There is another way of getting the correct answer without doing all these calculations and that is by using a lookup table. In this solution we pre-calculate all the answers and put it in a table. So a simple lookup will give us the correct result. This technique is used quite often when speed is important.

In this solution we use a small table since only the 4 left bits are of importance when decoding UTF-8. So we need a single shift before we can do the lookup.

int table[] = {
  0, // 0000
  0, // 0001
  0, // 0010
  0, // 0011
  0, // 0100
  0, // 0101
  0, // 0110
  0, // 0111
  1, // 1000
  1, // 1001
  1, // 1010
  1, // 1011
  2, // 1100
  2, // 1101
  3, // 1110
  4, // 1111
};

int small_table(uint8_t value)
{
  return table[value >> 4];
}

Solution 7: Table lookup

This solution is basically the same as solution 6, but the difference is that we use a larger table in order to save the extra shift each time we do the lookup. This is the fastest way I have found of calculating the number of leading bits in a byte.

The external table is not included in this example since it contains 256 entries. Instead of writing all these entries by hand I used a small python script to generate the code.

extern int table[];

int big_table(uint8_t value)
{
  return table[value];
}

Conclusion

When given a problem like count the number of leading bit's it is not always obvious what the optimal algorithm is. Sometimes it is not even an algorithm at all, it's just a table lookup. Anyway these experiments are a good way of learning all the different ways of solving a problem and I encourage people to try them out.

Find your favorite rfc and start implementing :)