Build it up #
Here are the changes that need to be made in order to get WNLIB running on Mac OS X:
- Add ./ to your path (if it's not already there, as is the case with the default OS X setup).
- Locate all
#includes of<malloc.h>and make them refer to<sys/malloc.h>instead. - Optionally, to remove a warning, change
wn_system_memory_alloc_func_typeinwnmem.hso that its argument issize_tinstead ofunsigned int. This also requires including<stdlib.h>and changing the type of the first parameter oflo_substitute_allocinselftest_aux.c. - A bunch of files
#definetheir own value ofINFINITY. This is also#defined in<math.h>, so to remove the warning that results from this, we can just remove the#define(WNLIB defines it asWN_FHUGE(1.0e30), while<math.h>usesHUGE_VALF(1e50f) and so the two values are close enough). translate_errnoinwnio.cseems to support two modes to convert error codes into a human readable message. One is to use thestrerrorfunction (if present) and the other is to look up the error code in thesys_errlistsystem-wide global. Mac OS X seems to support both ways, but because of the way WNLIB tests to see ifstrerroris available (looking iflinuxor__CYGWIN__are defined), we default to the second mode. Unfortunately theexterndeclaration that WNLIB doesn't match what is included in<stdio.h>(there's aconstmissing). The simplest fix is to make it use thestrerrorway instead, which can be accomplished by testing for the__APPLE__define in addition to all the others.
Getting the library to build is one thing, but actually using it requires more effort. Since it is compiled with a C compiler and I'm using C++, to get the symbol name mangling to be consistent I also had to wrap the #includes in an extern "C" {}" block. There is a wn_assert in wntrnf.c that checks if total_capacity(i_capacities,len_i) and total_capacity(j_capacities,len_j) are equal. Unfortunately we're dealing with floating point numbers here, and precision issues do creep in. Making that assert a bit more lenient (only so many digits have to be equal) is necessary. Other precision issues also appear, mostly when comparing values with zero (especially when decrementing peripheries). Replacing those with comparisons to some epsilon value allows the code to run with real world data. More precisely, it allows wn_trans_problem_feasible, the first phase in solving the transport problem, to run. All this gives me is a feasible solution, but I would like an optimal one (or a close approximation thereof). This is done with the iterative function wn_trans_problem_simplex_improve which unfortunately despite all of my attempts, refuses to run (various asserts fail). I have for now given up on getting WNLIB to run (although its iterative approach made it appealing, since I could presumably get better running times out of it, at the expense of precision).
To continue with my string of failures for the day, I next moved on extracting usable data from the IAM dataset that I recently downloaded. The problem is that the images have a top portion that is meant for OCR, and then in the lower three quarters or so the actual handwritten sample resides. Ideally I only want this second part, but since the text is of varied length, it doesn't always start in the same place. My thinking was to take advantage of the fact that three horizontal lines mark the upper/lower boundaries of these sections, and by looking for those (rows with very low average values) I could see where to crop. Unfortunately the images have different intensities (in some cases the person used light pencil, in others very thick and dark marker) and so it's hard to come up with a sure-fire way of detecting the lines. This isn't really as much of a failure as the WNLIB attempt, but it's still rather annoying that in the end nothing worked today.
Post a Comment