Adaptive Lexical Analysis
Adaptive Lexical Analysis is at the heart of the Cyberoam Web Categorization Engine.
In this technology, sample documents/webpages are fed into the engine, which examines the lexical structures such as frequency of words, and position of words with respect to each other. Once engine is trained on the said set of webpages/document, it will categorize webpages/websites looking for lexical structures similar to those within the documents/webpages it was trained on and assign Probability Index to each categorized webpages/websites as per the level of confidence in categorization.
Cyberoam and Adaptive Lexical Analysis
Cyberoam uses Adaptive Lexical Analysis for categorizing the webpages/websites.
Cyberoam has an automated process where by its Categorization engine accepts new webpages and websites from our crawlers, spiders, and customer feedback (Automated and manual). It then uses Adaptive Lexical Analysis technique to categorize pages and rate them as per the level of confidence in categorization. Then human reviewers take over with pages being reviewed and training set is being updated.
Ideally, to begin with, certain sites were categorized correctly, the probability of a site belonging to a category was calculated, and Probability Index to each categorized webpages/websites was assigned as per the level of confidence in categorization. This formed the training set for the Categorization engine.
When the new/uncategorized site comes in, Categorization engine uses lexical analysis technique on the entire content and calculates the probability of a site belonging to a particular category based on it’s similarity to previous sites used to train. Once the classification is done, the site is used to train the Categorization engine further.
Document Version - 1.0-18/11/2007