Imagine you are building a decision tree to predict whether
Last updated: 6/20/2023
Imagine you are building a decision tree to predict whether a personal loan given to a person would result in a payoff i e the person pays off the loan or default the person fails to pay back the loan Your entire data set consists of 30 examples 16 belong to the default class 14 belong to the payoff class The examples contain two features Balance and Residence Balance refers to the amount of money the person has in their savings and checking accounts at the time of the loan which can take on two values 50K or a 50K o Residence refers to whether or not the person owns their home or rents and can take on two values OWN or RENT As stated the class label for y can be either default or payoff Before splitting the parent region into child regions the entropy of y in the parent region is H y parent 30 log 30 log 3 0 99 After splitting the parent region into two child regions CL and CR the weighted average of the entropy of y over the child regions is the sum r CL H y CL r CR H Y CR where r CL is the ratio of the number of examples in child region c divided by the number of total examples in the parent region prior to a split r CR is the ratio of the number of examples in child region CR divided by the number of total examples in the parent region prior to a split The goal is to find a good split that will yield a low weighted average entropy of the partitioned child regions Calculate the entropy for the parent node and see how much uncertainty exists by splitting on Balance The blue circles represent people from the payoff class and the red circles are people from the default class Splitting the parent node on the Balance attribute gives us two child nodes 50K or 50K Review the graphic below to see how the data is split 0 37 Entire population 30 instances 0 99 Balance 50K p 12 13 0 92 p 1 13 0 08 12 1 16 0 14 p 16 30 0 53 p 14 30 0 47 p 4 17 0 24 p 13 17 0 76 What is the entropy of the root node if we split on Balance Balance 50K 4 13