Comparative Performance of Interestingness Measures to Identify Redundant and Non-informative Rules from Web Usage Data

Title: Comparative Performance of Interestingness Measures to Identify Redundant and Non-informative Rules from Web Usage Data

Authors
Authors and Affiliations

Dilip Singh Sisodia, Riya Singhal, Vijay Kandal

Corresponding email: dssisodia.cs@nitrr.ac.in

Published at : 27 Jan 2018
Volume : IJtech Vol 9, No 1 (2018)
DOI : https://doi.org/10.14716/ijtech.v9i1.1510

Cite this article as:
Sisodia, D.S., Singhal, R., Kandal, V., 2018. Comparative Performance of Interestingness Measures to Identify Redundant and Non-informative Rules from Web Usage Data. International Journal of Technology. Volume 9(1), pp. 201-211

1,578

Downloads

Dilip Singh Sisodia	National Institute of Technology Raipur
Riya Singhal	National Institute of Technology Raipur
Vijay Kandal	National Institute of Technology Raipur

Email to Corresponding Author

Abstract

Comparative Performance of Interestingness Measures to Identify Redundant and Non-informative Rules from Web Usage Data

Association rules are used to predict frequent web user behaviors from web usage data. These rules are formed using frequent items. The number of association rules increases as the number of frequent items increases and produces several redundant and non-informative rules. In this paper, five interestingness measures, including cosine, lift, leverage, confidence, and conviction with a constant value of support are compared based on the number of redundant and non-informative rules that they produce. Redundant and non-informative rules are a subset of rules present in the top generated rules. The experimental results suggested that leverage produced the least number of redundant rules in the top rules but also produced the least informative rules among all measures. Lift showed the highest number of redundant rules but the most informative rules among all the measures.

Keywords

Association rule mining; Interestingness measures; Non-informative rules; Redundant rules; Weblogs

Conclusion

In this paper, the number of redundant and non-informative rules was determined using five interestingness measures, including confidence, conviction, cosine, leverage, and lift, for a constant value of support. The top rules are listed according to the respective values of the interestingness measures, and the number of redundant rules was determined. A rule is considered redundant if it is a subset of a valid rule. It was observed that leverage had the least redundant but also the least informative rules, as the antecedent or consequent could be combined to form a superset of the rule. Lift and cosine showed a similar type of top rules and showed the maximum redundant but more informative rules compared to other measures. Lift showed maximum redundant rules, as the valid rule that was an informative rule contained several subsets of the top rules. Confidence and conviction showed a similar type of top rules and contained rules that had valid rules that could be combined. This study experimentally confirmed that no measure is consistently better than others for all circumstances; however, there are situations in which many of these measures are highly correlated with each other. The presented algorithm was used to select a small set of rules in the form of tables so that experts can select the most appropriate measure by examining the small set of tables. The scope of present work was to identify redundant and non-informative rules from the total generated rules using interestingness measures and to compare the performance of the different measures used for the same purpose. This work can be utilized to identify the relationships among buying patterns of online users, decision making in effective e-marketing strategies, designing web-based personalized systems, and other types of patterns.

References

Agrawal, R., Imielinski, T., Swami, A., 1993. MiningAssociation Rules between Sets of Items in Large Databases. In: ACM SIGMOD Record, Volume 22(2), pp. 207–216

Ashrafi, M.Z., Taniar, D., Smith, K., 2004. A NewApproach of Eliminating Redundant Association Rules. In: InternationalConference on Database and Expert Systems Applications, pp. 465–474

Azevedo, P.J., Jorge, M., 2007. Comparing RuleMeasures for Predictive Association Rules. In:Machine Learning: ECML 2007,pp. 510–517

Brin, S., Motwani, R., Ullman, J.D., Tsur, S., 1997.Dynamic Itemset Counting and Implication Rules for Market Basket Data. In: ACM SIGMOD Record, Volume 26(2), pp. 255–264

Dimitrijevi?, M., Bošnjak, Z., Subotica, S., 2010. Discovering Interesting Association Rules in the WebLog Usage Data. Interdisciplinary Journal of Information, Knowledge,and Management, Volume 5, pp. 191–207

Han, J., Pei, J., Yin, Y., Mao, R., 2004. Mining FrequentPatterns without Candidate Generation: A Frequent-pattern Tree Approach. DataMining and Knowledge Discovery, Volume 8(1), pp. 53–87

Li, M., Yu, X., Ryu, K.H., 2014. MapReduce-based WebMining for the Prediction of Web-user Navigation. Journal of InformationScience, Volume 40(5), pp. 557–567

Mobasher, B., Liu, B., 2007. Chapter 12: Web UsageMining. In: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,pp. 449–483

NASA_SeverLog, 1995. NASA Kennedy Space Center’s Www Server Log Data, Available online at http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html

Sisodia, D., Verma, S., 2011. Application of Weblogsto Construct Smart Web Servers to Handle User Traffic Efficiently. InternationalJournal of Advanced Computer Engineering and Architecture, Volume 1(1), pp.141–152

Sisodia, D.S., Khandal, V., Singhal, R., 2016. FastPrediction of web User Browsing Behaviours Using the Most Interesting Patterns.Journal of Information Science, pp.1–19

Sisodia, D.S., Verma, S., Vyas, O.P., 2016a. A Conglomerate Relational FuzzyApproach for Discovering Web User Session Clusters from Web Server Logs. InternationalJournal of Engineering and Technology, Volume 8(3), pp. 1433–1443

Sisodia, D.S., Verma, S., Vyas, O., 2016b. A DiscountedFuzzy Relational Clustering of Web Users usingIntuitive Augmented Sessions Dissimilarity Metric. IEEE Access, Volume 4(1),pp. 6883–6893

Sisodia, D.S., Verma, S., Vyas, O.P., 2015a. A Comparative Analysis of Browsing Behavior ofHuman Visitors and Automatic Software Agents. American Journal of Systemsand Software, Volume 3(2), pp. 31–35

Sisodia, D.S., Verma, S., Vyas, O.P., 2015b.Agglomerative Approach for Identification and Elimination of Web Robots fromWeb Server Logs to Extract Knowledge about Actual Visitors. Journal of DataAnalysis and Information Processing, Volume 3(2), pp. 1–10

Spark, 2015. PythonProgramming Guide for Spark 0.9.0 Documentation. Available online athttps://spark.apache.org/docs/0.9.0/python-programming-guide.html, Accessed on December25, 2015

Tan, P.N., Kumar, V., Srivastava, J., 2002. Selecting the Right Interestingness Measure forAssociation Patterns. In: Proceedings of the Eighth ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining – KDD ’02,pp. 32–41

Zaki, M.J., 1999. Parallel and Distributed AssociationMining: a Survey. IEEE Concurrency, Volume 7(4), pp. 14–25

Zaki, M.J., 2000. GeneratingNon-redundant Association Rules. In:Proceedings of the Sixth ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining. ACM,pp. 34–43

Zaki, M.J., 2004. Mining Non-redundant AssociationRules. Data Mining and Knowledge Discovery, Volume 9(3), pp. 223–248

Download PDF

Who cite this paper

Table of Contents