So there are 5097 distinct genre strings in the database. Started by first mapping the more manageable 500 or so that have turned up in ID3 tags. Made up reasonably pretty web interface for this to make my life easier. Seems to work well enough, with reasonable performance. Later on added about half of the freedb genres (so up to 2356 mappings), with the net result being that 38% of artists are still uncategorized. I seem to have hit the point of diminishing returns, i.e. the last 1,000 mappings that I added only resulted in the categorization of 150 (out of 14,000) artists. Will probably finish up mappings anyway, but should look into obtaining even more datasets (Amazon?) and/or cleaning up the artist list (e.g. collapsing "A with B", "A vs. B" and so on). Should also make genre estimation merge more generic estimations with more specific ones, though have to figure out risk of one (or a few) bad (specific) mapping screwing things up.