The objectives of this section are:
to define the FP-Tree data structure
to introduce an alternative algorithm called the FP-Growth Algorithm
to explain the construction of the FP-Tree
to demonstrate how frequent itemsets are generated using this tree
By the time you have completed this section you will be able to:
define a FP-Tree
construct a FP-Tree for a given data set
use a FP-Tree to determine the frequent itemset
explain the major aspects of the FP-Growth Algorithm
The FP-Growth Algorithm is an alternative algorithm used to find frequent itemsets. It is vastly different from the Apriori Algorithm explained in previous sections in that it uses a FP-tree to encode the data set and then extract the frequent itemsets from this tree. This section is divided into two main parts, the first deals with the representation of the FP-tree and the second details how frequent itemset generation occurs using this tree and its algorithm.
A FP-tree is a compact data structure that represents the data set in tree form. Each transaction is read and then mapped onto a path in the FP-tree. This is done until all transactions have been read. Different transactions that have common subsets allow the tree to remain compact because their paths overlap.
The diagram to the right is an example of a best-case scenario that occurs when all transactions have exactly the same itemset; the size of the FP-tree will be only a single branch of nodes.
The worst case scenario occurs when every transaction has a unique itemset and so the space needed to store the tree is greater than the space used to store the original data set because the FP-tree requires additional space to store pointers between nodes and also the counters for each item. The diagram below is an example of how a worst case scenario FP-tree might appear. As you can see, the complexity of the tree grows with the uniqueness of each transaction.
The construction of a FP-tree is subdivided into three major steps.
The slideshow presentation below details a short example of a FP-tree construction