Build it up #

Here are the changes that need to be made in order to get WNLIB running on Mac OS X:

  • Add ./ to your path (if it's not already there, as is the case with the default OS X setup).
  • Locate all #includes of <malloc.h> and make them refer to <sys/malloc.h> instead.
  • Optionally, to remove a warning, change wn_system_memory_alloc_func_type in wnmem.h so that its argument is size_t instead of unsigned int. This also requires including <stdlib.h> and changing the type of the first parameter of lo_substitute_alloc in selftest_aux.c.
  • A bunch of files #define their own value of INFINITY. This is also #defined in <math.h>, so to remove the warning that results from this, we can just remove the #define (WNLIB defines it as WN_FHUGE (1.0e30), while <math.h> uses HUGE_VALF (1e50f) and so the two values are close enough).
  • translate_errno in wnio.c seems to support two modes to convert error codes into a human readable message. One is to use the strerror function (if present) and the other is to look up the error code in the sys_errlist system-wide global. Mac OS X seems to support both ways, but because of the way WNLIB tests to see if strerror is available (looking if linux or __CYGWIN__ are defined), we default to the second mode. Unfortunately the extern declaration that WNLIB doesn't match what is included in <stdio.h> (there's a const missing). The simplest fix is to make it use the strerror way instead, which can be accomplished by testing for the __APPLE__ define in addition to all the others.

Getting the library to build is one thing, but actually using it requires more effort. Since it is compiled with a C compiler and I'm using C++, to get the symbol name mangling to be consistent I also had to wrap the #includes in an extern "C" {}" block. There is a wn_assert in wntrnf.c that checks if total_capacity(i_capacities,len_i) and total_capacity(j_capacities,len_j) are equal. Unfortunately we're dealing with floating point numbers here, and precision issues do creep in. Making that assert a bit more lenient (only so many digits have to be equal) is necessary. Other precision issues also appear, mostly when comparing values with zero (especially when decrementing peripheries). Replacing those with comparisons to some epsilon value allows the code to run with real world data. More precisely, it allows wn_trans_problem_feasible, the first phase in solving the transport problem, to run. All this gives me is a feasible solution, but I would like an optimal one (or a close approximation thereof). This is done with the iterative function wn_trans_problem_simplex_improve which unfortunately despite all of my attempts, refuses to run (various asserts fail). I have for now given up on getting WNLIB to run (although its iterative approach made it appealing, since I could presumably get better running times out of it, at the expense of precision).

To continue with my string of failures for the day, I next moved on extracting usable data from the IAM dataset that I recently downloaded. The problem is that the images have a top portion that is meant for OCR, and then in the lower three quarters or so the actual handwritten sample resides. Ideally I only want this second part, but since the text is of varied length, it doesn't always start in the same place. My thinking was to take advantage of the fact that three horizontal lines mark the upper/lower boundaries of these sections, and by looking for those (rows with very low average values) I could see where to crop. Unfortunately the images have different intensities (in some cases the person used light pencil, in others very thick and dark marker) and so it's hard to come up with a sure-fire way of detecting the lines. This isn't really as much of a failure as the WNLIB attempt, but it's still rather annoying that in the end nothing worked today.

Post a Comment