Abstract
Existing Web usage mining techniques are currently based on an arbitrary division of the data (e.g. “one log per month”) or guided by presumed results (e.g. “what is the customers’ behaviour for the period of Christmas purchases?”). These approaches have two main drawbacks. First, they depend on the above-mentioned arbitrary organization of data. Second, they cannot automatically extract “seasonal peaks” from among the stored data. In this paper, we propose a specific data mining process (in particular, to extract frequent behaviour patterns) in order to reveal the densest periods automatically. From the whole set of possible combinations, our method extracts the frequent sequential patterns related to the extracted periods. A period is considered to be dense if it contains at least one frequent sequential pattern for the set of users connected to the website in that period. Our experiments show that the extracted periods are relevant and our approach is able to extract both frequent sequential patterns and the associated dense periods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD conference, Washington DC, USA, May, pp 207–216
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 11th international conference on data engineering (ICDE’95), Tapei, Taiwan, March
Bonchi F, Giannotti F, Gozzi C, Manco G, Nanni M, Pedreschi D, Renso C and Ruggieri S (2001). Web log data warehousing and mining for intelligent web caching. Data Knowl Eng 39(2): 165–189
Cooley R, Mobasher B and Srivastava J (1999). Data preparation for mining world wide web browsing patterns. Knowl Inf Syst 1(1): 5–32
Cormen T, Leiserson C, Rivest R (1994) Introduction to algorithms. MIT Press
Fayad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R, (eds) (1996) Advances in knowledge discovery and data mining. AAAI Press Menlo Park, CA
Han J, Kamber M (2001) Data mining, concepts and techniques. Morgan Kaufmann
Hay B, Wets G and Vanhoof K (2004). Mining navigation patterns using a sequence alignment method. Knowl Inf Syst 6(2): 150–163
http Analyze, http://www.http-analyze.org/
Kleinberg J (2002) Bursty and hierarchical structure in streams. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, July 23–26
Kum H, Pei J, Wang W, Duncan D (2003) ApproxMAP: approximate mining of consensus sequential patterns. In: Proceedings of SIAM International Conference on data mining, San Francisco, CA
Kumar R, Novak J, Raghavan P, Tomkins A (2003) On the bursty evolution of blogspace. In: WWW ’03: Proceedings of the 12th international conference on World wide web, pp 568–576
Masseglia F, Cathala F, Poncelet P (1998) The PSP approach for mining sequential patterns. In: Proceedings of the 2nd European symposium on principles of data mining and knowledge discovery (PKDD’98), Nantes, France, September, pp 176–184
Masseglia F, Poncelet P and Cicchetti R (2000). An efficient algorithm for web usage mining. Netw Inf Syst J 2: 571–603
Masseglia F, Poncelet P, Teisseire M, Marascu A (2005) Web usage mining: Extracting unexpected periods from web logs. In: Proceedings of the 2nd workshop on temporal data mining (TDM 2005), held in conjunction with the 5th IEEE international conference on data mining (ICDM’05), Houston, USA, 27 November
Meger N, Rigotti C (2004) Constraint-based mining of episode rules and optimal window sizes. In: Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (PKDD), Pisa, Italy, September, pp 313–324
Mobasher B, Dai H, Luo T and Nakagawa M (2002). Discovery and evaluation of aggregate usage profiles for web personalization. Data Mining Knowl Discov 6(1): 61–82
Mueller A (1995) Fast sequential and parallel algorithms for association rules mining: a comparison. Technical report CS-TR-3515, Department of Computer Science, University of Maryland-College Park, August
Nakagawa M, Mobasher B (2003) Impact of site characteristics on recommendation models based on association rules and sequential patterns. In: Proceedings of the IJCAI’03 workshop on intelligent techniques for web personalization, Acapulco, Mexico, August
Neuss C, Vromas J (1996) Applications CGI en Perl pour les Webmasters. Thomson Publishing
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu MC (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: 17th international conference on data engineering (ICDE)
Spiliopoulou M, Faulstich LC, Winkler K (1999) A data miner analyzing the navigational behaviour of web users. In: Proceedings of the workshop on machine learning in user modelling of the ACAI’99 international conference Creta, Greece, July
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology (EDBT’96), Avignon, France, September, pp 3–17
Tanasa D, Trousse B (2004) Advanced data preprocessing for intersites web usage mining. IEEE Intell Syst 19(2):59–65. ISSN 1094-7167
Webalizer, http://www.mrunix.net/webalizer/
World Wide Web Consortium. (1998) httpd-log files. http://lists.w3.org/Archives
Zhu J, Hong J, Hughes JG (2002) Using Markov chains for link prediction in adaptive web sites. In: Proceedings of soft-ware 2002: first international conference on computing in an imperfect world, Belfast, UK, April, pp 60–73
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible Editor: Chang-shing Perng.
Rights and permissions
About this article
Cite this article
Masseglia, F., Poncelet, P., Teisseire, M. et al. Web usage mining: extracting unexpected periods from web logs. Data Min Knowl Disc 16, 39–65 (2008). https://doi.org/10.1007/s10618-007-0080-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-007-0080-z