Having trouble finding features

mrichey mrichey

I was very grateful when reading your book.

I had been struggling with a problem at work (well I'm still struggling, but at least I don't feel as lost). I'm trying to identify some characters, but having trouble with what features I should be using. Here are some of my images (that's them after I've done some processing).

I think I need global features since neither rotation nor scale are really an issue for me. However, the "feature" array I do best with is when I just pass in a binary array represeting pixels (1 for white, 0 for black). I'm using a RandomForest classifer, but I'm trying to stick to the mantra of your book that good features are more important than the algorithm. I asked on reddit but mainly got answers to change algorithms.

wr wr
Willi Richert (TwoToReal)

Hi Michael

Thanks for your nice words. I think I will have time tomorrow to have a closer look. Will that work out for you?

luispedro luispedro


My personal experience is that fiddling with algorithms can give you an extra 2-5%. So, if you're close to something usable and just want an extra boost, then it may be worth it; but better features can give you much larger gains, especially when you are still far from the goal.

I see you tried a bunch of stuff already in the direction of global features.

An alternative direction to try is to preprocess your images (downsampling/Gaussian smoothing [or, more generally, wavelet based methods]; edge filtering,...) and feeding in those pixels.

HTH Luis

wr wr
Willi Richert (TwoToReal)

Exactly (DoorsOfPerception's on reddit's thread mentioned a somewhat similar approach). The characters represented by those isolated points as it is the case right now will make it more difficult than necessary to any learning method...

mrichey: Any concerns sharing your current code+data?