SLP Header

Efficient Closure Checking Mechanism for Cube Miner Algorithm

IJCSEC Front Page

Abstract
This paper, we propose an efficient closure checking mechanism for three dimensional closed pattern mining from 3D datasets. We propose Cube Miner+ which extends the cube miner algorithm and incorporates the proposed closure checking scheme. We also aim at optimizing the existing algorithm by incorporating additional optimization techniques to reduce computation time.
Keyword:Data mining, closed patterns, mining methods, Algorithms.
I.Introduction
Mining frequent patterns or item sets is a fundamental and essential problem in many data mining applications. Frequent item sets play an essential role in many data mining tasks that tries to find interesting patterns from databases, such as association rules, correlations, sequences, episodes, classifiers, clusters and many more. The task of discovering all frequent item sets is quite challenging. The search space is exponential in the number of items occurring in the database. The support threshold limits the output to a hopefully reasonable subspace. Also, such databases could be massive, containing millions of transactions, making support counting a tough problem. The mining of the complete set of frequent item sets will lead to a number of huge and redundant item sets. Fortunately, this problem can be reduced to the mining of frequent closed item sets, which results in a much smaller number of item sets. A frequent closed item set is either a maximal frequent item set, or a frequent item set whose support is higher than the supports of all its proper supersets.
Frequent pattern mining is a fundamental step to several essential data mining tasks. It is an unsupervised mining technique that identifies all subsets of items or attributes frequently occurring in many database records or transactions. The concept of frequent closed pattern mining is proposed to overcome the problems of redundant patterns and rules and to enumerate feasible patterns from very long frequent patterns. Frequent closed pattern mining is found to be efficient in many application domains like market-basket analysis, www usage mining, gene expression data analysis, etc. In a typical gene expression data analysis, items denote gene expression properties in biological situations. Here, the frequent sets denote the sets of genes that are frequently co-regulated and thus can be suspected to participate to a common function within the cells. Frequent Closed Pattern Mining can reveal patterns about how the expression of one gene may be associated with the expression of a set of genes under a set of environments.
Given such information, we can easily infer that the genes involved participate in some kind of gene networks. Moreover, such association rules can be used to relate the expression of genes to their cellular environments and time periods simultaneously. Such associations can help to detect cancer genes in different cancer developing stages, especially when cancer is caused by a set of genes acting together in-stead of a single gene. It is however possible to use the properties of Galois connection to compute the closed sets on the smaller dimension and derive the closed sets on the other dimension. column set. Several algorithms [2] [5] [6] are proposed to mine frequent closed patterns from 2D space. In 3D frequent closed pattern mining (FCP), patterns are generated from 3D datasets comprising of height set, row set and column set.
The 2D frequent set mining problem concerns the computation of sets of attributes that are true together in enough transactions, i.e., given a frequency threshold. The patterns are generated from a 2D dataset which comprises of row set and column set. Several algorithms [2] [5] [6] are proposed to mine frequent closed patterns from 2D space coming up with the 3D datasets. The Frequent closed cube (FCC), which generalizes the notion of 2D frequent closed pattern to 3D context with two approaches called Representative Slice Mining algorithm (RSM) that exploits 2D FCP mining algorithms to mine FCCs and Cube Miner that directly mines from 2D Dataset

References:

  1. Liping, Ji, Tan, K.L. and Tung, A.K.H. ”Mining Frequent Closed Cubes in 3D datasets,” Proc. 32nd int. conference on very large databases, 2006
  2. Besson J, Robardet C and Boulicaut J.F, "Constraint based mining of formal concepts in transactional data, PAKDD’04 pp. 615-624, 2004.
  3. Liping, Ji “Mining Localized Co-expressed Gene Patterns from Microarray Data,” PhD dissertation, School of Computing., Na-tional University of Singapore, June 2006.
  4. Mohammed J. Zaki and Ching-Jui Hsiao, “Efficient Algorithms for Mining Closed Items sets and Their Lattice Structure”,IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, april 2005
  5. Jian Pei, Jiawei Han and Runying Mao, “CLOSET : An Efficient Algorithm for Mining Frequent Closed Item sets”, Intelligent Database Systems research Lab, School of Computing Science, Simon Fraser University, Canada.
  6. Jianyong Wang, Jiawei Han, Jian Pei, “CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Item sets”, Department of Computer Science, New York
  7. Liping Ji, Kian-Lee Tan and Anthony K.H. Tung, “Compressed Hierarchical Mining of Frequent Closed Patterns from Dense Data Sets”, IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 9, sep 2007
  8. Lucchese C, Orlando S and Perego R, "Fast and Memory Efficient Mining of Frequent Closed Item sets", IEEE Transactions on Knowledge and Data Engineering, VOL 18, No 1, pages 21-36, January 2006
  9. Grahne G., Zhu J. “Fast Algorithms for Frequent Itemset Mining Using FP-Trees”, IEEE Transactions on Knowledge and Data Engineering, Vol 17, No 10, pages 1347- 1362, October 2005.
  10. Gao Cong, Kian-Lee Tan, Tung, A. K. H. and Feng Pan, Mining Frequent Closed Patterns in Microarray Data”, ICDM’04, Vol 1, Issue 4, pp: 363-366, Nov 2004
  11. Ben Yahia, S, Hamrouni,T and Mephu Nguifo E, "Frequent Closed item set based Algorithms: A through structural and analytical survey," SIGKDD Explorations, Vol. 8, Issue 1, pages 93-104 2006
  12. Jiawei Han, Jian Pei, Yiwen Yin, Runying Mao, “Mining Frequent Pattern without candidate Generation : A Frequent Pattern Approach” Journal of Data Mining and Knowledge Discovery , Springer, pages 53-87, 2004.
  13. Mingjun Song, Sanguthevar Rajasekaran, "A Transaction Mapping Algorithm for Frequent Itemsets Mining", IEEE Transactions on Knowledge and Data Engineering,, VOL 18, No 4, pages 472-481, April 2006.
  14. D.Burdick,M.Calimlim,J.Flannick,J.Gehrke,Y.Yiu, "MAFIA: A Maximal Frequent Itemset Algorithm", IEEE Transactions on Knowledge and Data Engineering, VOL 17, No 11,Pages 1490 - 1504, November 2005.
  15. R. Agrawal, T. Imielinski, and A. Swami. "Mining association rules between sets of items in large databases", In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pages 207-216, Washington, DC, May 1993.
  16. R. Agrawal and R. Srikant, "Fast algorithms for mining association rules", In Proceeding of Int. Conf. Very Large Data Bases, pages 487-499, Santiago, Chile, September. 1994.
  17. A. Savasere, E. Omiecinski, and S. Navath, "An efficient algorithm for mining association rules in large databases", In Proc. of Intl. Conf. on Very Large Databases (VLDB), 1995.
  18. N. Pasquier, Y. Bastide, R. Taouil, and L.Lakhal, "Discovering Frequent Closed Itemsets for Association Rules", Proc. 7th Int. Conf. Database Theory (ICDT'99), pages 398-416, January 1999.
  19. M.J. Zaki,C.J.Hsiao, "CHARM: An Efficient Algorithm for Closed Itemset Mining",Proc SIAM Int. Conf. Data Mining, pages 457-473, April 2002.
  20. Dao-l Lin and Zvi M. Kedem, "PINCER-SEARCH: An Efficient Algorithm for Discovering the Maximum Frequent Set", IEEE Trans. on Knowledge and data Engineering, VOL 14, No. 3, pages 553-566, June 2002.
  21. Ahmed Shakil, Frans Coenen, Paul Leng, “Tree based partitioning of data for association Rule Mining”, Knowledge and Information Systems, Springer, Vol 10, N0 3, pages 315-331, October 2006.
  22. A. Veloso, M. Otey, S. Parthasarathy, and W. Meira, “Parallel and distributed frequent itemset mining on dynamic datasets”, In Proc. of the High Performance Computing Conference, HiPC, Hyderabad, India, December 2003.
  23. Yew-kwong Woon, Wee-keong Ng, Ee-Peng Lim, “A Support-ordered Trie for Fast Frequent Itemset Discovery”, IEEE Transactions on Knowledge and Data Engineering, Vol 16, No 7,pages 875-879, July 2004.
  24. Guimei Liu, “Supporting Efficient and Scalable Frequent Pattern Mining”, PhD Thesis, Hong Kong University, May 2005