I really enjoy programming, especially when there is a concrete specification of both the input and the output of the program. From my experience receiving a good specification with description of the input and output is rather rare in the software engineering field. That is why I like grabbing an rfc from time to time and implement it.
Lately I have been looking at UTF-8 decoding and encoding which is specified in rfc-3629. The great thing with rfc's is that they contain a complete specification and they are easy to read and understand.
When decoding a UTF-8 byte stream you need an über function for counting the number of leading 1 bits in a byte. So we need a function the given the input bits
1000 0000 returns 1 and given the input bits
1100 0000 returns 2 and so on. You would think that with such a small problem, there are a small set of algorithms that solves this. However the reality is quite different. I will show the different solutions that I tried out when programming this. Note that I have limited the solutions to only count the first 4 bits.
Solution 0: Mask and compare
This is what I consider the straight forward solution. In this solution we first filter out the bits to the left and check if they match a pattern. This is not the quickest solution due to all the masking of bits.
Solution 1: Compare single bit
This is very similar to solution 0, the only difference is that instead of checking for a specific bit pattern on the left side of the byte we are only checking a single bit. This works because checking the number of leading 1 bits is equivalent to checking where the first 0 bit is. The complexity of this solution is equal to solution 0 since it is doing the same number of masks and compares.
Solution 2: Greater than
Another way of checking the number of leading 1 bits is to change the view of the problem. We know that if a byte is larger than for instance
0xf0 then it must have at least 4 leading 1 bits. Using a chain of greater than comparisons we can figure out how many leading 1 bits we have.
Solution 3: Less than
Using greater than comparison is not that optimal because we check for the uncommon case first, and most of the time a UTF-8 byte stream has only a single byte per unicode character. So we invert the check and use a chained list of less than comparisons instead.
Solution 4: Compiler builtin function
Given the fact that I am the only one working on this code and I know what compiler i will be using I can start using builtin functions that the compiler provides. In my case I am using gcc and it provides a builtin function for counting the number of leading zeros. So I will use that function and tweak the input and output so I make it count the number of leading 1 bits.
Solution 5: Assembler instruction
The next step I tried looks a lot like the previous solution, but this time I'm using inline assembler code. The x86 architecture has a lot of nice instructions and one of them is
bsr (Bit Scan Reverse). This instruction will search for the first 1 bit and return the position of the bit. So given the input
0x0f the bsr instruction will return 3 since the leftmost 1 bit in
0x0f is on bit index 3. We can use this instruction to calculate the number of leading 1 bits.
Solution 6: Small table lookup
All the previous solutions they calculate the result in some way. There is another way of getting the correct answer without doing all these calculations and that is by using a lookup table. In this solution we pre-calculate all the answers and put it in a table. So a simple lookup will give us the correct result. This technique is used quite often when speed is important.
In this solution we use a small table since only the 4 left bits are of importance when decoding UTF-8. So we need a single shift before we can do the lookup.
Solution 7: Table lookup
This solution is basically the same as solution 6, but the difference is that we use a larger table in order to save the extra shift each time we do the lookup. This is the fastest way I have found of calculating the number of leading bits in a byte.
The external table is not included in this example since it contains 256 entries. Instead of writing all these entries by hand I used a small python script to generate the code.
When given a problem like count the number of leading bit's it is not always obvious what the optimal algorithm is. Sometimes it is not even an algorithm at all, it's just a table lookup. Anyway these experiments are a good way of learning all the different ways of solving a problem and I encourage people to try them out.
Find your favorite rfc and start implementing :)