I have briefly discussed the problem of word noise in a previous post. This is when a text based classification system is stymied by too much content.  An over-abundance of content – especially content from varying topics – creates an impossibility for classification. If there is business related content and video game related content and gardening […]
Empty URLs
June 22nd, 2010Word Noise and Site Classification
June 12th, 2010In a previous post I discussed the problems that traditional website categorization methods have with sites that lack text or links or anchor text. Here is the other extreme: word noise. Some websites simply have too much content that is all over the spectrum. There is so much text, so much content, that the categorization […]