Parsing Bitstream

27 Oct 2014

Dealing with bytes in a programming language is quite straight forward. There are language features for extracting a single byte from a set bytes. However in many low-level cases you might want to deal with bits instead. This is especially useful when parsing audio and video bitstreams.

To solve this bit-parsing task I have decided to create a nice abstraction on top of stream of bytes to be able to handle it like it was a stream of bits. So I have created this c++ interface.

#pragma once

#include <cstdint>
#include <cstddef>

class BitStream
{
  const uint8_t * begin;
  const uint8_t * end;
  size_t pos;

public:
  BitStream(const uint8_t * bits, size_t len);

  uint32_t get(uint8_t len);
  void skip(uint8_t len);
};

The idea here is that you already have already read a certain number of bytes into a buffer, and then you want to use this BitStream abstraction as a convenient way of parsing the bits from left to right.

Now we will take a look at the different functions of this class. Lets take the skip function first since it is the easiest.

void BitStream::skip(uint8_t len)
{
  pos = pos+len;
}

Since the internal pos field holds the current bit position of the stream we just need to increase it to skip forward. Next we will take a look at the meat of the class, the get function.

uint32_t BitStream::get(uint8_t len)
{
  size_t byte_offset = pos/8;
  uint8_t bit_pos = pos % 8;
  uint8_t bit_avail = 8 - bit_pos;

  if (begin+byte_offset >= end)
  {
    throw std::out_of_range("No more bits");
  }

  if (bit_avail >= len)
  {
    uint32_t value = begin[byte_offset];
    uint8_t mask = 0xff >> bit_pos;
    value = value & mask; // remove everything to the left of the bits we need
    value = value >> (bit_avail - len);
    pos = pos + len;
    return value;
  }
  else
  {
    uint32_t value = get(bit_avail);
    value = value << (len - bit_avail);
    value = value | get(len - bit_avail);
    return value;
  }
}

Lets go through this code. Since all the bits are stored as a sequence of bytes we first need to find which byte contains the current bit position.

size_t byte_offset = pos/8;

Next we need to know the bit position within the current byte we are working with.

uint8_t bit_pos = pos % 8;

And the last thing we need is to know how many bits can we extract from this byte without moving to the next one.

uint8_t bit_avail = 8 - bit_pos;

The next thing I do is to check if we have moved outside the range of the bitset. This is to protect the user against buffer overflow problems. Next we will do the actual extraction of bits and turn them into values that we can return to the user. If you think about the problem of extracting n bits. You have basically two cases:

  1. Either all the bits are available in the current byte
  2. You need to extract bits from multiple bytes

The first case is the easies to implement so we to that first.

if (bit_avail >= len)
{
  uint32_t value = begin[byte_offset];
  uint8_t mask = 0xff >> bit_pos;
  value = value & mask; // remove everything to the left of the bits we need
  value = value >> (bit_avail - len);
  pos = pos + len;
  return value;
}

First we pick out the full byte value of the current byte. Then we mask off all the bits to the left of bit_pos and then we shift the resulting value down to remove all the bits to the right of the value. The last thing we do is to increase the bit position so that we know where to begin the next time around.

Now for the hard part, what can we do if we need more bits than are currently available? We must of course first grab all the bits that are available and then move to the next byte and do the same process. This can be implemented either with a loop or as I have done with a recursive function.

else
{
  uint32_t value = get(bit_avail);
  value = value << (len - bit_avail);
  value = value | get(len - bit_avail);
  return value;
}

That's it, so next time you're going to parse a stream of bits you have a nice abstraction to use.