Grabbing image from web camera

04 Oct 2013

One of the most amusing things to program is hardware. Seeing that a program you create actually does something with some electronic device gives a feeling of power and control. It feels like you are doing something creative and useful, as opposed to just shuffling data around.

Modern laptops have almost always an integrated camera in the lid, and in this note I will show you how to program against that camera.

So our first questing is, where in our operating system is the webcam. Well in linux and in most other operating systems everything is exposed via files. So we need to look for a file that represents our webcam. On linux we also need to know that everything that has to do with video can be programmed using the Video4Linux API or v4l2.

We will use a video4linux command line utility for listing local video devices.

~$ v4l2-ctl --list-devices
Laptop_Integrated_Webcam_E4HD (usb-0000:00:1a.0-1.5):
  /dev/video0

Great, now I know the name of my webcam and also the device node or file that represents this camera. /dev/video0 Just for fun we will dig up more information on our local webcam.

~$ v4l2-ctl -D
Driver Info (not using libv4l2):
  Driver name   : uvcvideo
  Card type     : Laptop_Integrated_Webcam_E4HD
  Bus info      : usb-0000:00:1a.0-1.5
  Driver version: 3.11.1
  Capabilities  : 0x84000001
    Video Capture
    Streaming
    Device Capabilities
  Device Caps   : 0x04000001
    Video Capture
    Streaming

So we can see that this is a usb camera from the Bus info. We can also see that this is using the uvcvideo driver which is the standard way of sending video data over usb. We can also see that the device capabilities. When grabbing video frames it is important that the device supports "Video Capture". Now we want to find out what video formats the device supports.

~$ v4l2-ctl --list-formats-ext
ioctl: VIDIOC_ENUM_FMT
  Index       : 0
  Type        : Video Capture
  Pixel Format: 'YUYV'
  Name        : YUV 4:2:2 (YUYV)
    Size: Discrete 640x480
      Interval: Discrete 0.033s (30.000 fps)
      Interval: Discrete 0.050s (20.000 fps)
      Interval: Discrete 0.067s (15.000 fps)
      Interval: Discrete 0.100s (10.000 fps)
      Interval: Discrete 0.200s (5.000 fps)
    Size: Discrete 352x288
      Interval: Discrete 0.033s (30.000 fps)
      Interval: Discrete 0.050s (20.000 fps)
      Interval: Discrete 0.067s (15.000 fps)
      Interval: Discrete 0.100s (10.000 fps)
      Interval: Discrete 0.200s (5.000 fps)
    Size: Discrete 320x240
      Interval: Discrete 0.033s (30.000 fps)
      Interval: Discrete 0.050s (20.000 fps)
      Interval: Discrete 0.067s (15.000 fps)
      Interval: Discrete 0.100s (10.000 fps)
      Interval: Discrete 0.200s (5.000 fps)
    Size: Discrete 176x144
      Interval: Discrete 0.033s (30.000 fps)
      Interval: Discrete 0.050s (20.000 fps)
      Interval: Discrete 0.067s (15.000 fps)
      Interval: Discrete 0.100s (10.000 fps)
      Interval: Discrete 0.200s (5.000 fps)
    Size: Discrete 160x120
      Interval: Discrete 0.033s (30.000 fps)
      Interval: Discrete 0.050s (20.000 fps)
      Interval: Discrete 0.067s (15.000 fps)
      Interval: Discrete 0.100s (10.000 fps)
      Interval: Discrete 0.200s (5.000 fps)
    Size: Discrete 1280x720
      Interval: Discrete 0.091s (11.000 fps)
      Interval: Discrete 0.200s (5.000 fps)

  Index       : 1
  Type        : Video Capture
  Pixel Format: 'MJPG' (compressed)
  Name        : MJPEG
    Size: Discrete 1280x720
      Interval: Discrete 0.033s (30.000 fps)
      Interval: Discrete 0.050s (20.000 fps)
      Interval: Discrete 0.067s (15.000 fps)
      Interval: Discrete 0.100s (10.000 fps)
      Interval: Discrete 0.200s (5.000 fps)

This is the most interesting part. We now know that this camera can provide use with either YUV 4:2:2 (YUYV) data, which is used for raw image data which is useful if you want to compress it yourself to h.264 or mpeg-video. However I am more interested in the fact that this camera can provide me with compressed motion jpeg data (MJPEG), which is basically just jpeg image one after another.

Using this information overload I can set a programming goal. I want to grab 1280x720 frames of MJPEG data from my webcam on /dev/video0. Sounds like a good plan.

So how do we even start to talk with the webcam at /dev/video0. Basically all the operations on hardware is exposed by the operating system via the three system calls read, write and ioctl. Read is for getting bytes from the driver, write is for pushing bytes to the driver and ioctl is for controlling behaviour or sending options/configurations. When doing video stuff like grabbing pictures from a video camera we can use the v4l2 library. This library has wrapped the read/write/ioctl functions and provides the user with structures that the video device driver understands.

The algorithm for getting video data is basically like this

#include <libv4l2.h>

void main() {
    v4l2_open("/dev/video0", O_RDWR | O_NONBLOCK, 0);
    v4l2_ioctl(fd, request, argp);
    v4l2_read(fd, buffer.start, buffer.length);
    v4l2_close(fd)
}

First we open the video device, just like an ordinary file. Then we use ioctl to set the configuration that we need, this is where we configure the camera to provide use with 1280x720 MJPEG data. The next step is to read the actual data and finish with closing the file handle. This code sample is missing the actual parameters and error handling code, so take a look at my mjpeg-grabber project on github to see the full code. It's currently only 325 lines of C code needed for grabbing and storing jpeg data from a webcamera.