The objectives of this section are:
to introduce alternative representations for frequent itemsets
to define the maximal frequent itemset representation
to define the closed frequent itemset representation
By the time you have completed this section you will be able to:
explain and identify the maximal frequent itemset
explain and identify the closed frequent itemset
It is a frequent itemset that is both closed and its support is greater than or equal to minsup.
An itemset is closed in a data set if there exists no superset that has the same support count as this original itemset.
The lattice diagram above shows the maximal, closed and frequent itemsets. The itemsets that are circled with blue are the frequent itemsets. The itemsets that are circled with the thick blue are the closed frequent itemsets. The itemsets that are circled with the thick blue and have the yellow fill are the maximal frequent itemsets. In order to determine which of the frequent itemsets are closed, all you have to do is check to see if they have the same support as their supersets, if they do they are not closed.
For example ad is a frequent itemset but has the same support as abd so it is NOT a closed frequent itemset; c on the other hand is a closed frequent itemset because all of its supersets, ac, bc, and cd have supports that are less than 3.
As you can see there are a total of 9 frequent itemsets, 4 of them are closed frequent itemsets and out of these 4, 2 of them are maximal frequent itemsets. This brings us to the relationship between the three representations of frequent itemsets.
In conclusion, it is important to point out the relationship between frequent itemsets, closed frequent itemsets and maximal frequent itemsets. As mentioned earlier closed and maximal frequent itemsets are subsets of frequent itemsets but maximal frequent itemsets are a more compact representation because it is a subset of closed frequent itemsets. The diagram to the right shows the relationship between these three types of itemsets. Closed frequent itemsets are more widely used than maximal frequent itemset because when efficiency is more important that space, they provide us with the support of the subsets so no additional pass is needed to find this information.