The objectives of this section are:
to explain the problem of frequent itemset generation
to introduce the main strategies for solving this problem
to explain the Apriori Principle
to explain the Apriori Algorithm and it's various parts
By the time you have completed this section you will be able to:
list the main strategies for solving the frequent itemset generation problem
describe the Apriori Principle
explain the major steps of the Apriori Algorithm
list the requirements for efficient Candidate Generation
distinguish between the methods for efficient Candidate Generation
In order to generate a frequent itemset list one must avoid using the brute force approach described in the previous section because it can be very expensive to search through the whole data set to find the support count of each itemset. Some of the strategies used to address this problem are :-
States that if an itemset is frequent, then all of its subsets must also be frequent. This principle holds true because of the anti-monotone property of support.
This means that when f is anti-monotone (or downward closed) and X is a subset of Y, then the support of X, s(X) must not exceed the support of Y, s(Y).
The converse is also true for when f is monotone (or upward closed), this means that when X is a subset of Y, then the support of Y, s(Y) must not exceed the support of X, s(X).