UAH
Computer Science
Research Project

 

 

 

Finding Frequent Webpage Access Patterns

Upcoming Events: TESTING PHASE....

 

 Project Description

      Frequent pattern mining, which discovers frequent patterns in a database, is an important data mining problem with broad applications, including the analyses of customer purchase behavior, web access patterns, scientific experiments, disease treatments, natural disasters, DNA sequences, and so on [5].  For a specific example, suppose we are given a database of DNA sequences for certain species, and a minimum support β, which is a number greater than 0, we want to determine  the subsequences which appear more than β times in the DNA sequence database. In biology, these frequent DNA subsequences often correspond to a common ancestor or a common virus.

     There are many different kinds of frequent pattern mining problems. In particular, we have frequent sequential pattern mining problems and frequent subtree pattern mining problems. Frequent sequential pattern mining discovers frequent subsequence patterns in a sequence database, while frequent subtree pattern mining discovers frequent subtree patterns within semi-structured tree databases.  In this project, we are going to study these two frequent pattern mining problems through an example study, searching for frequent webpage access patterns.

     In order to maximize webpage access efficiency (such as posting advertisements in certain frequently accessed webpages), the designers often want to know which webpages are accessed more often than others. First, we will study a simpler version of this problem. Note that we can identify each webpage by its unique URL. If we assume that webpage users never use the “back button” in their browser when they surf webpages, then each webpage access history can be viewed as a sequence of URLs. Therefore, the frequent webpage access pattern problem reduces to the problem of mining frequent sequential pattern within a sequence database.

    Our second approach is more general, we assume that webpage users do use the back button when they browse webpages, hence each webpage access history can be viewed as a labeled tree. In this tree, the labels are the URLs for the webpages, the root is the webpage where the user starts to surf, and the internal nodes of the tree correspond to the webpages where users use the “back button” to return to other links in that page. Applying this model, the frequent webpage access pattern problem is reduced to the frequent subtree pattern mining problem. Our goal in this project is to design and implement efficient algorithms to find frequent webpage access patterns from a given database of access history.

 

 

Comment Book | Add a Comment | Search | References | Implementation_Code.htm

Copyright © 2006 CSRP
All Rights Reserved