Monday, 30 September 2013

Paper summary : Mining Association Rules between Sets of Items in Large DBs

Supermarkets need to make important business decisions every day (e.g., what to put on sale, how to design coupons, etc), but databases do not provide functionality to derive association rules from customer transaction information. This paper presents an algorithm to mine such rules in real world datasets.

Given a set of antecedents and one consequent, the algorithm finds rules that have a specified transaction support (the percentage of transactions that contain the union of consequent and antecedent) and a specified confidence (the percentage of transactions that satisfy the antecedent also satisfy the consequent). This is done by finding large itemsets (with a minimum transaction support) and generating rules from these itemsets.

Finding large itemsets is expensive, hence, estimation procedures, pruning, and memory management are essential factors for consideration. The estimation procedure is used to determine the itemsets which should be measured in a passes. The algorithm seeks to balance the number of passes over the data and the number of itemsets measured in a pass. The pruning technquies discard certain itemsets to reduce wastage of unnecessary computational effort, and can prune out itesets as soon as they are generated. Memory management describes the situation when all the itemsets generated in the pass do not fit in memory. By spooling data to the disk, the memory management algorithm guarantees that the algorithm terminates.

The algorithm produces excellent results on the provided dataset. The estimation procedures were accurate and the pruning procedures worked well.

Paper : Mining Association Rules between Sets of Items in Large DBs; Agrawal et. Al., ACM SIGMOD 1993

No comments:

Post a comment