Datsets and Databases #

One problem that I will eventually have to deal with is finding a good dataset to evaluate Thor against. Since sketch searching is still a nascent research area, there don't seem to be any good "benchmarks" that I could run my code against. I can rely on images returned by some Google image search, but that involves putting a lot of faith in my sketch extraction as well as a lot of cleaning up and categorization by hand.

One alternative is to use handwriting data, collections of which are much more numerous. The CEDAR database seems to be popular, but getting access to it seems to involve ordering a CD-ROM. There may be something in this list of computer vision test images but it'll require some digging. More helpfully, the IAM Database from Switzerland is available online in the form of high resolution TIFF scans. The fact that the scans are so large may help, since one issue with handwriting data is that it's very high frequency and not very representative of the kind of writing that people will be doing on whiteboards. There's still the problem of verifying results, since most of these databases are recognition oriented, whereas I'm looking for similar shapes.

N.B. Most of the database sites were found through this links page.

Post a Comment