Chapter 2 Association Analysis
Section 6 FP-Growth Algorithm
Page 3 F.I. Generation
Objectives
The objectives of this section are:
to define the FP-Tree data structure
to introduce an alternative algorithm called the FP-Growth Algorithm
to explain the construction of the FP-Tree
to demonstrate how frequent itemsets are generated using this tree
Outcomes
By the time you have completed this section you will be able to:
define a FP-Tree
construct a FP-Tree for a given data set
use a FP-Tree to determine the frequent itemset
explain the major aspects of the FP-Growth Algorithm
Copyright © Rakesh Verma 2009
Frequent Itemset Generation in FP-Growth Algorithm
The FP-Growth Algorithm using the FP-Tree data construct that was discussed on the previous page to generate frequent itemsets. The FP-growth algorithm is interesting because it shows how a compact representation of a transaction data set helps to generate frequent itemsets efficiently. It does this by using a divide and conquer approach to find the frequent itemsets ending in a particular suffix. The four step process is described briefly below.
- Create prefix paths for a particular suffix node. This is done by gathering all the paths containing a particular suffix node. Any path that ends with this suffix is examined.
- Using the prefix path tree determine whether the suffix is frequent. This is done by adding the support counts associated with the node and if the number is greater than or equal to the minsup the node is frequent. If the node isn’t frequent the analysis ends for this suffix.
- Convert the prefix paths into a conditional FP-tree.
- Update the support counts along the prefix paths to reflex the actual number of transactions containing the itemset
- Truncate the prefix paths by removing the nodes of the chosen suffix
- Remove items that may no longer be frequent (if the support count of a particular node is less than minsup it is no longer frequent and should be pruned).
- Repeat I →III for all prefix paths for the chosen suffix.
- Repeat Steps 1-3 for all suffix nodes to determine the frequent itemset for the dataset.
For an example consult your traditional textbook