Little Things #

Date: Wednesday, December 31, 2003

Made the 8-way flood fill in the CapturedImage class (which is used to find connected changed pixels) use an in-place vectorinstead of recursive calls for performance/scalability reasons.

I had previously noticed that thin vertical lines were not being picked up by the frame-to-frame difference algorithm. This turned out to be not because of the algorithm itself, but because of the despeckling that I did immediately after. The despeckling would remove any pixel which didn't have at least two non-background neighbors. In the case of a vertical line, since we'd start the scanning from the top, we'd remove the end-point since it only had a non-background pixel right below it. On the next scan-line, the same thing would happen, since this pixel had now become the end-point. The fix was to make the despeckling a two pass process. On the first pass, we simply count how many non-background neighbors each pixel has. On the second pass we remove those whose neighbor count is below the threshold of two and don't have any neighbors that satisfy the threshold. I also tweaked the despeckling so that if a background pixel was completely surrounded by non-background ones, it would be replaced with the average color of its neighbors. This removed one pixel "holes" that would throw off the thinning and stroke algorithms.

As previously mentioned, one problem with counting crossings to see how many times a pixel can be visited (in the case of stroke intersections) was that, even when the image is thinned, there can be pixels that, in their 8x8 neighborhood, don't have enough black-to-white transitions to be flagged as a crossing. Based on the Liu paper, I came up with an alternative criterion that also includes neighbors. Basically, for a pixel to be marked as supporting two visits, it must pass at least one of these tests:

4 or more black to white transitions
3 black to white transitions and 4 or more black neighbors
2 black to white transitions and 5 or more black neighbors

When following pixels to determine which ones make up a stroke, to make sure that we went through a crossing in the right direction, we would first look for the next pixel in the same direction as the previous one that had been found. So for example, in the case on the left, the two strokes would be separated "naturally" (red and green denote the two strokes, yellow pixels are shared):

  1      1
  1      1
11111 111111
  1     1
  1     1

However, in the case of the pattern on the right, the stroke has no choice but to make a turn, and then it continues along that path, with the net separation being two unnatural strokes that each make a sudden turn.

Despite being on my third paragraph describing the problem, the actual fix was two lines long. If we keep track of the average direction for the past several points, then local changes in the direction will not have such disproportionate impact. Given some decay constant K, the new direction D' and the average direction D, the update function is D = D * K + D' * (1.0 - K)

Putting it together #

Date: Monday, December 29, 2003

Labels: Thor Post a Comment

Thor in action To commemorate achieving a milestone of sorts, I'm including this screenshot of the four current phases that the program goes through. They are capture, differences, thinning (and separation into sub images) and stroke extraction. In the stroke images, control points are in red, strokes in random colors to help differentiate them.

As was mentioned yesterday, fixing the stroke collapsing required changing the error function to measure deltas instead of overall errors. Currently, given a point P, we find the active ones preceding and succeeding it (A and B), and compute the difference in the error function as it sampled A to P and P to B versus directly A to B (which would be the case if P were removed). Sampling is uniform, some constant K times along A to P and P to B, and 2K along A to B.

One issue is to know when to stop. Absolute error boundaries won't work since bigger strokes can have much more error and still look OK whereas smaller ones won't. Perhaps some normalization by stroke length will work.

Queueing up is in my blood #

Date: Sunday, December 28, 2003

Labels: Thor Post a Comment

Worked on the actual stroke simplification/collapsing today. Ported over collapse queue from mesh simplification assignment (turned it into a real C++ object finally) and made it use an STL vector for its storage instead of a hand-rolled chunked array. Collapsing seems to work to some extent, but a) it's slow and b) it's not quite right.

The speed issues seem to stem from the fact that my error function computes error across the entire stroke (overlap between it and the original thinned pixels). At each point then I store this error value assuming that that point had been removed. What I really need to store is the delta in the function when it is removed, and that way when I remove a completely unrelated point, I shouldn't have to worry this one. Not handling this updating properly is also what I think is causing the correctness issues - when I do the most thorough updating possible (update all other entries after one is removed, regardless of connectivity) I get better results.

Stroke junction, what's your [error] function? #

Date: Saturday, December 27, 2003

Labels: Thor Post a Comment

Working on stroke extraction. First step is to build up the unsimplified stroke, which has a control point per pixel. This isn't quite that simple, since we want to handle intersecting and forking strokes too. I decided to use an extended "visited" image, where instead of storing a boolean per pixel, we have the total number of times a pixel can be visited (for crossings this can be more than one obviously). Since we're dealing with a thinned image, this is as simple as counting the number of dark to white transitions as the neighbors are walked clockwise, and dividing by two (rounding down). This works almost all the time, with the exception of cases such as:

101
111
101

Here the center "1" has only two such transitions, but it's still at a crossing. The Liu paper (it might be useful after all) attempts to deal with such points (and other similar patterns) by also counting neighbors, so this may be something that's worth looking into when tweaking this further. For the actual walking along a stroke's pixels, we keep track of the direction that we previously went in, and look in that direction first for the next step. This lets us continue through intersections with the same stroke rather than suddenly branch off.

Implemented my error metric. Right now it regularly samples the stroke (number of strokes is based on length), adding up the sum of square differences between the current pixel and the background (so I would in fact try to maximize the value of this function). My regular sampling right now is based on distances between control points, which works OK when all I'm doing is linear interpolation for the actual stroke, but may be error inducing if I switch to Catmull-Rom splines or another interpolation method.

I also got around to trying out my old Nikon CoolPix 990, and it doesn't seem to support the kICAMessageCameraCaptureNewImage Image Capture framework message either (had also tried a Pentax Optio 550 and a Casio Exilim EX-S2). This article has a list of cameras that should support this functionality (the "Remote Monitor" feature is based on it), but I've yet to track one down.

Who needs presents when you've got a thesis #

Date: Thursday, December 25, 2003

Labels: Thor Post a Comment

It turned out both of those problems (certain diagonal lines not being thinned, others being removed completely) were caused by the same thing, a bug in the aforementioned c functions. It turns out that it checks of the following patterns (*'s can be of any value, red is the current pixel around which the pattern is centered):

* 0 0      0 1 *      * 1 0      0 0 *
1 1 0  or  1 1 0  or  0 1 1  or  0 1 1
0 1 *      * 0 0      0 0 *      * 1 0

These patterns look for two pixel thick diagonal lines, which although would satisfy the other criteria for not being thinned (e.g. have more than one white to dark transition as their neighborhood is traversed), they should in fact be gotten rid of. What is interesting is that the Wang paper only uses the first two patterns, even though the latter are clearly valid as well. Adding them improved results even further.

It's now tempting to try and improve the thinning even further (e.g. do a king of reverse despeckling, so that holes in an otherwise solid area are filled so they can be thinned away) or to tweak the stroke sub-image extraction (e.g. remove very small ones, but not all of them, since periods and what not get to be pretty small when they're thinned too -- maybe only those not near any big sub-images). But such refinements can wait, it is now time for stroke extraction.

However, the Liu paper is beginning to seem increasingly dubious, since it's pretty elaborate and somewhat skimpy on some details (parts of it reference another paper by the author that I can't seem to find). Another approach to consider is to mimic the method used for the mesh simplification assignment. I can build up strokes based on the thinned image that initially use every single pixel as a control point. Then, defining the collapse operation as a vertex removal, I can remove as many as I deem necessary in order of increasing error (my error metric will be pixel overlap of stroke and original thinned image). I can play around with different stroke interpolation methods (I have code for linear and Catmull-Rom from 426).

The Thinning Continues #

Date: Wednesday, December 24, 2003

Labels: Thor Post a Comment

The annoying thing about the Wang paper is that it's not very detailed. I can't tell if it's because of space limitations when it was published (it is part of the IEEE Transactions - maybe there's a more detailed version out there), but it leaves a lot of things "as an exercise for the reader" as it were.

For example, one of the things that it requires is the number of "contour loops" (e.g. a line would have one, a hollow circle two, a figure eight three, and so on) present in the image, and their starting points. My first approach was to find an edge pixel in the image, walk along it (the paper's successor function is, given the current and previous edge pixels, walk along clockwise in the neighborhood, starting with the previous pixel, and pick the first dark pixel encountered) and mark all other edge pixels encountered as visited, and to then repeat the process until there's no more edge pixels left. However, this didn't work, because some edge pixels would be skipped over when walking along. For example, in the picture below, the red pixels would be marked off on the first pass, but the green one, though considered an edge pixel, would have to wait for another pass, causing the creation of a spurious "loop".


1100
1110
1110

The better way of doing this was to pick any background pixel, do a flood-fill (marking off any background pixels as visited) and pick any of the edge pixels encountered as the starting point. Then repeat this until there are no more background pixels left.

The paper is also cryptic in that it provides pseudo-code for the algorithm, but it doesn't say anything about why or how it works. For example, there's a function c that, from my best guess, appears to check for diagonal cases, but I'm not really sure why. It's also not very clear on the termination function when iterating. Just seeing if you pass the same point twice isn't good enough, since you can encounter it from two directions, and not in fact have completed a loop. Testing for a point plus its predecessor (to incorporate direction) seems fundamentally like a good idea, but there's still problems since points can get deleted, and what used to be its predecessor may not be there at all in the next iteration, and then we're stuck in an infinite loop.

The net result of all this is that the thinning seems to mostly work, except it doesn't sometimes thin two pixel thick diagonal lines, and it entirely decimates another type of diagonal lines (not very even/regular ones). The former I think I've tracked to bugs in the mysterious c function, but for the latter I have no idea.

Just like the Stephen King novel #

Date: Tuesday, December 23, 2003

Labels: Thor Post a Comment

Decided yesterday that my haphazard "intuitive" approach to stroke extraction (convolution with that positive/negative kernel, stroke extraction as outlined in my presentation) wasn't really going to work. The next step was to turn to the literature that I had accumulated. Hew99 (Strokes from Pen-Opposed Extended Edges) seemed promising, but his approach of extracting opposed edges and trying to infer stroke direction seemed overly complex and not very well described in any case. He did however mention other approaches, one being the thinning-based Liu97 (Robust Stroke Segmentation Method for Handwritten Chinese Character Recognition). His results looked good, but he didn't provide any details on the thinning part of his algorithm, instead he pointed to Wang89 (A Fast and Flexible Thinning Algorithm). The paper was somewhat cryptic and/or entertaining due to its age (ooo, we shall describe the image as a matrix of pixels, quelle nouvelle idee), but the idea seemed workable.

Before I could get to the thinning, I first had to split up the difference image into sub-images, one per future set of strokes (each being a subset of 8-way connected pixels). In my quest to learn more, I decided to use the STL for a few classes that I'd need for this (hash_set and vector). The idea was fundamentally sound, but using hash_set took longer than expected, since it's not part of the standard STL (SGI extension apparently), and using it requires one to use the __gnu_cxx namespace. What with Xcode's error messages about this not being very clear, it took a while.

Thinning requires one to worry about edge pixels, and this also gives us an easy way to compute stroke thicknesses (2 * shape area/share perimeter length). This can be used to throw out differences that aren't really strokes (e.g. people), though right now this only works if strokes aren't partially obscured (i.e. not connected). I may have to add color distance to my connectivity criteria (right now it just being "is not a background pixel").

Monkey see, monkey do #

Date: Monday, December 22, 2003

persistent.info

Blog Archive

Labels

About Me