What to look for in a provider?
The market has seven companies that offer professional classification services, each company specializes in different areas of classification and you should check which features are important to you, and which are less so you can make the right decision.
What is the expertise of the service provider?
Each provider specializes in one or more areas:
- Porn detection.
- Productivity detection (entertainment sites, nonsense sites).
- 0-Day Virus detection.
- Phishing detection.
- Reputation based detection.
Most providers provide some of these abilities with a strong emphasis on one, so if you need strong porn detection, a company that is more security oriented might not be the best fit.
Komodia’s URL classification emphasis in on porn and productivity detection.
What kind of API provided?
You’ll need to connect your offering with the classification service, each provider may provide you with a different kind of interface:
- Raw protocol – You will need to implement everything yourself, including caching.
- Known protocol – Protocol is based on a known protocol, mostly REST, you will have libraries but you will still need to implement caching.
- DLL or similar – You will interface with the component, the component will do most of the work for you.
- Complete solution – No need for integration, the classification is built in the solution.
Depending on the interface provided, you will need to spend a few hours to a month of integration.
Komodia’s URL classification provides you with a built in integration into Komodia’s Redirection SDK or a DLL in case you have your own solution.
Detection per URL or per domain?
Classification per URL term is overused in the industry, it gives the false sense that every URL is classified and some companies boast that they are classifying billions of URLs, in reality what happens is that there are number of domains that are indeed checked per URL, and the rest are considered as one category for the entire domain and when you query for domain.com/url1 and domain.com/url2 you will get the category of domain.com regardless of the category of url1 and url2, but they will consider those urls as classified, hence the “billions” of URLs classified.
There are number of options here:
- Domain based detection – Only the domain is checked, sub domains are not checked, sites like blogger or other blogging sites are not classified per blog, but are considered a blogging site in general, if you have a “bad” sub blog, it will not be detected.
- Domain and sub domain detection – Will dynamically detect sub domains and classify them, some providers also support dynamic URL detection for sites like: “Google sites”, further more there are sites that only have one sub section which is porn but because they are very high profile the classifier needs to be able to detect just the porn URL out of the entire site.
- Search engine classification – Will be able to classify a search URL and tell you the category of the search, this allows you to give stronger enforcement over the search engine protection (like safe search) and some high profile sites, specially image hosting don’t provide safe search.
- Google translate bypass – The ability to classify a site or phrase being translated by Google, often this is used to bypass existing solutions.
- Dynamic classification – Ability to request a specific page to be classified on the fly, can be useful for sites like YouTube and Vimeo that you might want to classify a movie.
Komodia’s URL classification is doing: Per domain and sub domain, dynamic URL detection for number of sites with possible mixed content, Google translate and Search engine classification and Dynamic classification.
True per page classification
As stated in the previous paragraph, most vendors claim to have this ability, but in reality they don’t, the easiest way to test it, is to try and classify Wikipedia single pages and see if they come different per topic.
Per page classification is needed either for advertising companies that want to increase CPM and profitability, or parental control that want even higher level of protection.
Komodia’s URL classification allows you to select whether you want to receive the domain category, or category for a single web page.
Some vendors provide “impartial” tests for accuracy. We like to quote the famous former British prime minister Benjamin Disraeli: “There are three kinds of lies: lies, damned lies, and statistics.”, we reviewed those tests and the test was rigged in a way that it will show the strength of the vendor vs. the competition and hide the weaknesses.
We did some tests ourselves on adult web sites, and came up with 99.99% accuracy vs. 96% for the best vendor compared to (some had 80% accuracy), keep in mind we can’t name any names.
You can request a trial and test the accuracy yourself.
New site detection
There are two ways to handle new sites:
- Wait for a manual review or for the next scan to occur, users would not be able to visit this site for up to 48 hours, reducing client’s satisfaction.
- Classification is done automatically, users may have a few seconds delay before using the site.
Komodia’s URL classification is automatically classifying new URLs.
Coverage is the amount of sites in the database, beyond a certain number like the first 1m Alexa it has no value. Some companies boast with x millions of sites in their database as a competitive advantage, it’s critical when you are unable to perform on the fly classification, in that case it’s important to have a big coverage to avoid customer frustration, but unless you have all the sub sites of all major blog hosting companies (for example blogspot), this number has no meaning.
Komodia holds about 6m domains in the database, and any new domain it encounters the servers classify on the fly and will be added to our domain database.
Database freshness means how recent is your data, it is mostly relevant for domains that were not renewed and now are bought by domain parking companies, these sites usually display ad links which many times can contain adult content. A fresh database will be able to detect those kind of parked domains, or sites that changed their content (for good or bad).
Komodia reclassifies the entire database every 7 days.
Some vendors will require a setup fee, a setup fee can be a payment that is not related to the user count, or it can be as an upfront deposit for a future user count.
Komodia does not ask for setup fees, payment is on a monthly basis.
Some vendors will require an annual commitment (without granting any price discounts), this usually goes in combination with a setup fee.
Komodia does not require any form of annual commitment, you can disconnect from the service at 30 days notice.
Under some conditions it makes sense to license the technology, Komodia provides the option to license the URL classification technology.