Computer Vision: diciembre 2011

Today, I would like to show you my first OCR module. This system is based on HOG and PHOG features, I used Random Forest to recognition. Besides, I did some comparative of my recognition rate with the Tesseract OCR system (without any preprocessing).

Features:

The HOG features extractor module is the follow:

Resize the character image to patch of 32x32 pixels (see the examples below).
Calculate the magnitude and the orientation of the gradient of axis-X and axis-Y.
Split the magnitude and the orientation matrix into 8x8 cells. Extract the HOG of these cells and grouping them a block of 2x2 cells and then it'll be normalized.
Concact all block HOG features.

Figure 1) Samples of ICDAR 2003 data set. Easy and hard samples (a, a, p, C, R, S, E, i, E).

And, the PHOG features module is similar to HOG features: do the same procedure for every pyramid level but without grouping cells into blocks.

Experiment setup:

ICDAR 2003 database provide training and testing data set. In these two sets there are 71 classes. I used 100 trees with depth of 50 for recognition. Below, in the figure below shows the accuracy rate vs number of orientation bins.

After looking at these accuracy rate, there are two points to discuss. The first one, the Tesseract library is not good to recognize natural scene characters, since it is trained with scanned documents. The second one, it seems that HOG features code characters better than PHOG. Maybe the grouping cells into block stage is more discriminative than pyramid grouping stage.

Finally, the best accuracy rate of HOG and PHOG-RandomForest is about 71% and 61% respectly (quantified with 10 bins), and Tesseract achieves 37.25%.

My best accuracy rate richs 71%, however the Stanford University result show 81.7%.

jueves, 22 de diciembre de 2011

OCR module