The objectives of this section are:
To define training and testing errorsBy the time you have completed this section you will be able to:
to define training and testing errors
to define overfitting and underfitting
to explain the balancing act needed to avoid either extreme
In a perfect world there will be no diseases and in a perfect classifier model there will be no errors, but we don’t live in a perfect world and so we have to deal with errors. After a tree has been constructed there are two main types of errors that we come across when dealing with decision tree classifiers: training errors and generalization errors. Training errors are misclassification errors that are committed on training records. For instance if you were to use the training records on the model you have just created, whenever an item is not classified into the right group a training error is noted. Generalization errors (test errors) are errors observed by using test or previously unseen records on the new model that has been developed.
The best case scenario would be to have a model which when tested has a low training error and also a low generalization error but this is hard to obtain. There is a balancing act involved in creating the best model. A low training error might seem ideal but it results in a high generalization or test error. In order to get a low training error we fit the training data too well to the model thus resulting in a situation known as Model Overfitting. The reverse is also true; we experience Model Underfitting when we have a high training error as a result of not creating a model that fits the training data too loosely.
INSERT ANIMATION IF POSSIBLE INVOLVING NODES OR GRAPH SHOWING HOW WE LOSE SEE SAW