Web Filtering Via URL Classification

June 30th, 2010

Companies and network administrators use web filtering capabilities to avoid employees accessing non-productive or objectionable sites.  These are sites that take valuable program resources from your network and at the same time expose you and your system to aversive actions.  This can be everything from fines and lawsuits to hackers, malware, and viruses.  Also, inappropriate […]

Network Administrator By Proxy

June 27th, 2010

Malevolent Internet websites really are a cornerstone of online criminal actions. Consequently, there’s been wide-ranging interest in creating methods to avoid the end user from going to such websites. One answer to this growing concern is automated URL classification. This method uses statistical techniques to uncover the properties of malicious and malware website URLs. These […]

DIY URL Categorization

June 26th, 2010

I have a friend who thinks that he can build his own URL database.  “Why should I pay somebody else to classify URL addresses for me?” he says.  This is the kind of guy who like to organize his video collection by genre and date.  His socks are all in color co-ordinated separate drawers.  He […]

Empty URLs

June 22nd, 2010

I have briefly discussed the problem of word noise in a previous post.  This is when a text based classification system is stymied by too much content.  An over-abundance of content – especially content from varying topics – creates an impossibility for classification.  If there is business related content and video game related content and gardening […]

Port Blocking & URL Categorization

June 20th, 2010

A network administer may wish to stop users on his network from accessing certain kinds of sites.  For instance, if all of your users are spending a great deal of time on Facebook, then an administrator may wish to block this particular site or “social networking” sites in general. Port blocking and URL categorization is […]

One Limitation of URL Classification

June 18th, 2010

It is true that by creating a large database of malicious URL addresses, URL classification can allow a network administrator the power to completely block a malicious or suspicious URL.  There is, of course, one small limitation of this method: What happens when a user on your network is the first to stumble upon a […]

What Can Be Done With URL Categorization?

June 16th, 2010

The Internet is growing at an alarming rate.  Typically, ULR filtering has been traditional with an eye on history and semi permanent classification.  This is beginning to change.  So much more can be done with URL classification today.  Web content can be classified as you go now.  This way you get URL classification that is […]

Given Only the URL, Can We Identify the Topic?

June 14th, 2010

Given just the URL of a website, is it possible to identify the website’s topic?  And if so, why use this method? Certainly you can identify a web page  by its content.  You can use the actual text or hypertext or incoming/outgoing links or a combination of all these and others.  So if we follow […]

Word Noise and Site Classification

June 12th, 2010

In a previous post I discussed the problems that traditional website categorization methods have with sites that lack text or links or anchor text.  Here is the other extreme:  word noise. Some websites simply have too much content that is all over the spectrum.  There is so much text, so much content, that the categorization […]

All Sites Have a URL Address

June 10th, 2010

What do you do with a site that has no text?  If a site has nothing but images, how would you classify this website using traditional methods?  Would a text based classification even recognize the site at all? How do you use anchor text classification on a site that has no anchor text?  If the […]