Word Noise and Site Classification

June 12th, 2010

In a previous post I discussed the problems that traditional website categorization methods have with sites that lack text or links or anchor text.  Here is the other extreme:  word noise. Some websites simply have too much content that is all over the spectrum.  There is so much text, so much content, that the categorization […]