Criteria Class#
This class is used to evaluate the quality of a split while fitting a decision tree.
There are several default Criteria classes implemented in adaXT that can be directly loaded. It is also possible to create a custom criteria class, which is explained here.
Criteria classes can be loaded as follows:
A list of all available Criteria classes is given below.
Criteria #
The base Criteria class from which all other criteria need to inherit.
Entropy #
Entropy based criteria, which can be used for classification. Formally, given class labels \(\mathcal{L}\), the entropy in a node consisting of samples \(I\), is given by $$ \text{Entropy} = - \sum_{k\in\mathcal{L}} P[k] \log_2 (P[k]), $$ where \(P[k]\) denotes the fraction of samples in \(I\) with class label \(k\).
Gini_index #
Gini index based criteria, which can be used for classification. Formally, given class labels \(\mathcal{L}\), the Gini index in a node consisting of samples \(I\), is given by $$ \text{Gini_index} = 1 - \sum_{k\in \mathcal{L}} P[k]^2, $$ where \(P[k]\) denotes the fraction of samples in \(I\) with class label \(k\).
Partial_linear #
Criteria based on fitting a linear function in the first predictor variable in each leaf. Formally, in a node consisting of samples \(I\), it is given by $$ \text{Partial_linear} = \tfrac{1}{|I|}\sum_{i \in I} (Y[i] - \widehat{\theta}_0 - \widehat{\theta}_1 X[i, 0])^2, $$ where \(Y[i]\) and \(X[i, 0]\) denote the response value and the value of the first feature at sample \(i\), respectively, and \((\widehat{\theta}_0, \widehat{\theta}_1)\) are ordinary least squares regression estimates when regressing \(Y[i]\) on \(X[i, 0]\) using the samples in \(I\).
Partial_quadratic #
Criteria based on fitting a quadratic function in the first predictor variable in each leaf. Formally, in a node consisting of samples \(I\), it is given by $$ \text{Partial_quadratic} = \tfrac{1}{|I|}\sum_{i \in I} (Y[i] - \widehat{\theta}_0 - \widehat{\theta}_1 X[i, 0] - \widehat{\theta}_2 X[i, 0]^2)^2, $$ where \(Y[i]\) and \(X[i, 0]\) denote the response value and the value of the first feature at sample \(i\), respectively, and \((\widehat{\theta}_0, \widehat{\theta}_1)\) are ordinary least squares regression estimates when regressing \(Y[i]\) on \(X[i, 0]\) using the samples in \(I\).
Squared_error #
Squared error based criteria, which can be used for regression and leads to standard CART splits. Formally, the squared error in a node consisting of samples \(I\), is given by $$ \text{Squared_error} = \tfrac{1}{|I|}\sum_{i\in I} \Big(Y[i] - \tfrac{1}{|I|}\sum_{i\in I} Y[i]\Big)^2, $$ where \(Y[i]\) denotes the response value at sample \(i\).
For a faster, but equivalent calculation, it is computed by $$ \text{Squared_error} = \tfrac{1}{|I|}\sum_{i\in I} Y[i]^2 - \Big(\tfrac{1}{|I|}\sum_{i\in I} Y[i]\Big)^2 $$