Empty URLs

June 22nd, 2010

I have briefly discussed the problem of word noise in a previous post.  This is when a text based classification system is stymied by too much content.  An over-abundance of content – especially content from varying topics – creates an impossibility for classification.  If there is business related content and video game related content and gardening […]

Word Noise and Site Classification

June 12th, 2010

In a previous post I discussed the problems that traditional website categorization methods have with sites that lack text or links or anchor text.  Here is the other extreme:  word noise. Some websites simply have too much content that is all over the spectrum.  There is so much text, so much content, that the categorization […]