Chapter 1 Decision Trees
Section 3 Efficient Decision Tree Construction
Page 3 Induction Algorithms
Objectives:
The objectives of this section are:
to impart on the reader the need for an efficient construct
to present you with a high level algorithm for creating a decision tree
to inform you of the problems that arise with the algorithm and how they are addressed.
Outcomes:
By the time you have completed this section you will be able to:
compare decision trees and decide whether or not they are efficient
explain from a high level Hunts’ algorithms and the difficulties encountered
Copyright © Rakesh Verma 2009
Decision Tree Induction Algorithms
There are various algorithms that are used to create decision trees. Hunt’s Algorithm is one of the earliest and serves as a basis for some of the more complex algorithms. The decision tree is constructed in a recursive fashion until each path ends in a pure subset (by this we mean each path taken must end with a class chosen). There are three steps that are done until the tree is fully grown.
- Examine the record data and find the best attribute for the first node.
- Split the record data based on this attribute
- Recurse on each corresponding child node choosing other attributes
Figure 1 is a slide depicting a high level description of Hunt’s algorithm.
Issues that arise
There are two issues that arise when dealing with decision tree design algorithms.
- How should the training data be split? First of all how we split the tree is based on what kind of attributes we are dealing with, if we are dealing with binary attributes we use binary splits but when we are dealing with nominal, ordinal and continuous attributes we can employ either binary or multi-way splits. We pick a certain attribute based on how successful the split will be. We can measure how pure each split will be and then choose the most pure split. How successful these splits can be can be measured by using the GINI, Classification Error and Entropy which are explained in the intermediate level.
- How should the splitting procedure stop? At what point do you decide to stop the tree-growing process? This is another crucial area that must not be overlooked. One might assume that all we need to is to allow the recursive steps play out all the way through to the end but as we will discuss in the next section this might prove counterproductive and might be fruitless.