Articles Articles Most Popular Articles Most Popular Articles Most Helpful Articles Most Helpful Articles
DrillDown Icon Table of Contents
DrillDown Icon What's New
DrillDown Icon Knowledge Base Information
DrillDown Icon Cyberoam UTM
DrillDown Icon Product Literature
DrillDown Icon Best Practices & Policies
DrillDown Icon Protect Your Cyberoam Appliances from Power Fluctuations
DrillDown Icon Version 10.x
DrillDown Icon Version 9.x
DrillDown Icon How To
DrillDown Icon Troubleshooting
DrillDown Icon FAQ
DrillDown Icon Tech Notes
DrillDown Icon Cyberoam WAF aids in PCI DSS Compliance
DrillDown Icon Windows NTLM Vs Cyberoam Clientless Single Sign On Authentication
DrillDown Icon Anonymous proxies blocked by IPS Signatures
DrillDown Icon IM & P2P Applications blocked by IPS Signatures
DrillDown Icon Need of Same Zone Firewall Rule
DrillDown Icon Virus and Spam protection for Mail protocols
DrillDown Icon Need of Multicore-aware UTMs
DrillDown Icon Filtering HTTP over SSL connections
DrillDown Icon Website Categorization process
DrillDown Icon Spam detection process using RPD technology
DrillDown Icon Packet flow
DrillDown Icon Protocol Filtering
DrillDown Icon Network Traffic Log
DrillDown Icon Reports
DrillDown Icon Evaluating Cyberoam functionalities
DrillDown Icon Visio Stencils
DrillDown Icon Glossary
DrillDown Icon Product Technical Support
DrillDown Icon Compatibility
DrillDown Icon Cyberoam Virtual UTM
DrillDown Icon Endpoint Data Protection
DrillDown Icon Cyberoam SSL VPN
DrillDown Icon Cyberoam iView
DrillDown Icon Cyberoam Central Console
DrillDown Icon Cyberoam's On-Cloud Management Service
  Email This ArticlePrintPrint Current Article and All Sub-Articles
Rate Icon Rate Icon Rate Icon Rate Icon Rate Icon
 
Website Categorization process

Adaptive Lexical Analysis

 

Adaptive Lexical Analysis is at the heart of the Cyberoam Web Categorization Engine.

 

In this technology, sample documents/webpages are fed into the engine, which examines the lexical structures such as frequency of words, and position of words with respect to each other. Once engine is trained on the said set of webpages/document, it will categorize webpages/websites looking for lexical structures similar to those within the documents/webpages it was trained on and assign Probability Index to each categorized webpages/websites as per the level of confidence in categorization.

 

Cyberoam and Adaptive Lexical Analysis

 

Cyberoam uses Adaptive Lexical Analysis for categorizing the webpages/websites.


Cyberoam has an automated process where by its Categorization engine accepts new webpages and websites from our crawlers, spiders, and customer feedback (Automated and manual). It then uses Adaptive Lexical Analysis technique to categorize pages and rate them as per the level of confidence in categorization. Then human reviewers take over with pages being reviewed and training set is being updated.

 

Ideally, to begin with, certain sites were categorized correctly, the probability of a site belonging to a category was calculated, and Probability Index to each categorized webpages/websites was assigned as per the level of confidence in categorization. This formed the training set for the Categorization engine.

 

When the new/uncategorized site comes in, Categorization engine uses lexical analysis technique on the entire content and calculates the probability of a site belonging to a particular category based on it’s similarity to previous sites used to train. Once the classification is done, the site is used to train the Categorization engine further.
 
 
Document Version - 1.0-18/11/2007
 
Article ID: 802