The next step, once we have some kind of shape histogram, is to figure out distances between them. The hip, up-to-date way of doing this is to use earth mover's distance, the canonical paper for which is, The Earth Mover's Distance as a Metric for Image Retrieval. According to the paper, the algorithm degenerates to a transportation problem. One histogram servers as the source/supply side, with the other being the destination/demand. The bucket values are the supply and demand values, while the absolute difference in bucket indices is the distance. The paper suggests to solve this using the transportation-simplex method, gives a couple of reference, and then moves on to more interesting things. Those us who actually have to implement things are then left grasping at straws.
After briefly trying to find descriptions of the transportation-simplex method so that I could implement it myself, I decided that there must be an easier way. A quick Google search turned up FLOPC++ which seemed to do what I want, and featured a very nice syntax (and even presented a code snippet for doing a transportation problem). However, it depended upon COIN, an IBM-provided open source OR library. As usual, I feared an interminable chain of dependencies, and/or broken Makefiles, but things were surprisingly easy. COIN actually had a Darwin target, and built out of the box. FLOPC++ took some coaxing, since it was trying to build a shared library, something that the OS X build of
gcc doesn't support per se. Changing the linker from
libtool with a
-static argument seemed to do the trick (I can't take much credit for figuring this out, the COIN Darwin makefile described all this, and it was very easy to lift their config code).
Once all the libraries were built, it was time to try out some simple test code. After some configuration confusion (why does Xcode have a header include path option in both the project and in the specific target, both requiring to be set?), the code seemed to build, and I was all set to integrate COIN/FLOPC++ with the main codebase. All went well, until I tried to compile, and then all hell broke loose. The 150+ compiler errors that I got were very mystifying, especially considering that the code had built just fine in the test project. After painstakingly checking all compiler options (the main Thor project uses the "Carbon Application" template, while the test one used the "C++ Tool" one, which could account for some differences) to no avail, I finally tracked it down to the precompiled header that was being prepended to all source files. Unfortunately, seeing which of the
#include's in the .pch file was to blame was a very tedious task, since the entire project had to be rebuilt after each trial (the fact that ZeroLinking seems to make Xcode somewhat more reticent than it should in terms of recompiling files didn't help either). In the end, it turned out that any header that (even indirectly) included <Carbon/Carbon.h> caused things to go haywire. Spinning those off to a separate include (unfortunately I couldn't figure out how to make it precompiled as well, thus increasing compile times for now) seemed to fix the problem.
Once this was all done, and I got to actually run the code I had written, results were inconclusive. When using hand-drawn circles and squares, the distances seemed to be indicative of something, but how accurate it is remains to be seen.
P.S. It appears that the Princeton network has been having difficulties with traffic directed towards mscape.com this afternoon (i.e. nothing gets through). This entry is brought to you courtesy of CoDeeN/PlanetLab and the MIT CoDeen proxy.