The objectives of this section are:
to introduce you to the decision tree classifier
to explain the various parts of a decision tree
to present you with some advantages of using the decision tree classifier
to define Occam’s Razor and its application in decision tree construction
By the time you have completed this section you will be able to:
to recognize the various parts of a decision tree
to list some advantages of decision tree
to identify the best tree based on Occam’s Razor
As shown above there are many ways to build a decision tree, for each given dataset there are exponentially many possible trees. Due to the time complexity it is not feasible to find the optimal tree for each data set. But there are algorithms that have been developed that help resolve this issue and produce suboptimal decision trees within a reasonable time free. Before we discuss these algorithms it is important to highlight the different measures of impurity used to determine which attribute should be split first.
Once upon a time there was a man whose name was Occam and he loved to shave but one day while shaving with this equipment that he created named a ravor he realized that he didn’t have to do his whole face twice, then if he did the strokes across he could achieve the same result with less time and this is what Occam’s Razor is about.
Okay so maybe I went a little bit too far in the fabrication, there was no razor, no shaving stick but the theme of this made up tale is crucial in helping you to remember Occam’s razor and why it is important. Basically, it means that shortest solution or hypotheses to a problem should be the one we prefer to use over the complicated ones, so when it comes to classification and decision tree classifiers, we should prefer to use the shortest decision tree because it is more efficient.