About performance and precision: Using the camera, FLAR toolkit and PaperVision

Posted on September 30, 2009


This article is about the performance issues regarding FLAR toolkit and PaperVision. This is what I have deducted from the code. These are assumptions from tests I made and are not yet scientifically validated.

FLAR toolkit basics:

Loading the markers

  1. The markers are loaded into an marker object and placed Array

Getting the X/Y/Z postion and tilt / rotation

  1. The camera registers an image and presents it as streaming data to Flash
  2. Flash reads the streaming data from the camera and presents it in an Video object
  3. FLAR toolkit takes snapshots from the Video component, N times per seconds (where N can be a number between 1 and the number of framse
    1. The snapshot is made to a BitmapData object
    2. The snapshot is made using BitmapData.draw()
  4. From the snapshot image, straight lines are detected and with the straight lines possible markers (which are squares with equal sides)
  5. Using the angles between the lines, the parallax is calculated and transformed to the tilt of the object in “North/South” and “East/West” directions
  6. Using this information the potential marker-image is flattened and straightend to a 2D shape
  7. From the flattened image, the size is calculated (pixels) and compared to the “true” size of the image. This defines the Z-distance between the object and the camera

Recognizing the pattern

  1. Once the possible marker is flattened and straigthened, it can be compared to the patterns based on the marker-images
  2. By overlaying the found “markers” with the patterns (which are clear text and consist of numbers ranging from 0 to 255, very likely representing the “pixels” of the marker image) the marker is matched to the patterns
  3. When a match is found, the FLAR toolkit returns the Transformation Matrix to the main application. The ID of the recognized marker matches the location of the pattern in the Array.


The smaller the marker-image becomes in the Camera registration, the courser the “measurement” of the marker image becomes. Especially tilt and Z-distance will show jittery / jumpy behaviour. The reason is that the only reference FLAR toolkit has are the pixels in the image. The less pixels, the harder it becomes to get a precise result.

The best way to correct for this is on the 3D object itself: using the X/Y/Z and rotationX/Y/Z parameters and apply a “damper” to it. Basically:

  1. Take the previous value
  2. Take the new value
  3. Substract the old value from the new (calculate the delta)
  4. Device the delta by the number of steps you want to use to reach the new value (between i.e. 5 and 20 steps)
  5. Add the delta to the old value and make that into the current value.

What you will see is that the shape has a more organic behavior and jittery behavior is smoothed out.


The following is what I measured. It is indicative

  1. Basic flash movie streaming the camera image to a Video object: 20% processor usage
  2. Making snapshots of the camera image (10 times per second): 20 to 30% processor usage
  3. Distilling / recognizing FLAR markers: 15% to 25% processor usage
  4. Running PaperVision: the remaning processor capacity, depending on the number of visible/rendered vertices in your scene

Distilling the images: why it costs

I have been playing with several aspects of image recognition (and gained respect for the people who implemented it!). Why it costs so much:

  1. There is a blur applied. In most cases to clean up the image from artifacts / noise.
  2. There is a Convolution filter applied: either to sharpen the image or get the outlines.
  3. In many cases a treshold is used to get rid of all shades of “grey”

Both Blur and Convolution filters are expensive i.e. take a lot of processor time as they are performed per pixel. So the larger the image (image length x image height) the more cycles the CPU takes to process each pixel. As we talk 2D, an image of 640 x 480 pixels takes 4 times as much CPU time as an image half that size.

For Flash and camera-related things, the default capture size is 320 x 240 pixels. These are the reasons for that:

  1. It is a factor you can devide by 8 (a byte)
  2. It is a factor of the number of pixels the camera offers by default (the image from a webcam is either 320 x 240 or 620 x 480). 50% of 640 = “leave out all odd or even pixels”. 60% of 640 x 480 = “calculate which pixels to show and which to skip” which costs extra (wasted) CPU cycles.


With the current version (great white) PaperVision can handle about 12,000 vertices on a reasonable frame rate (measured with one processor on a macBook Pro, 2008 model).

If you want to respect this limit, to define the number of objects and the amount of vertices each object can have simply divide the maximum number of vertices by the number of objects. That will give you an indication.

VerticesPerObject = MaxNumberOfVertices / NumberOfObjects

The rule is simple: If you want to present a lot of objects or have an optimal performance, reduce the amount of vertices.

Posted in: Experiments