Skip to content

Criteria Class#

This class is used to evaluate the quality of a split while fitting a decision tree.

There are several default Criteria classes implemented in adaXT that can be directly loaded. It is also possible to create a custom criteria class, which is explained here.

Criteria classes can be loaded as follows:

from adaXT.criteria import CRITERIA_NAME

A list of all available Criteria classes is given below.

Criteria #

The base Criteria class from which all other criteria need to inherit.

Entropy #

Entropy based criteria, which can be used for classification. Formally, given class labels \(\mathcal{L}\), the entropy in a node consisting of samples \(I\), is given by $$ \text{Entropy} = - \sum_{k\in\mathcal{L}} P[k] \log_2 (P[k]), $$ where \(P[k]\) denotes the fraction of samples in \(I\) with class label \(k\).

Gini_index #

Gini index based criteria, which can be used for classification. Formally, given class labels \(\mathcal{L}\), the Gini index in a node consisting of samples \(I\), is given by $$ \text{Gini_index} = 1 - \sum_{k\in \mathcal{L}} P[k]^2, $$ where \(P[k]\) denotes the fraction of samples in \(I\) with class label \(k\).

Partial_linear #

Criteria based on fitting a linear function in the first predictor variable in each leaf. Formally, in a node consisting of samples \(I\), it is given by $$ \text{Partial_linear} = \tfrac{1}{|I|}\sum_{i \in I} (Y[i] - \widehat{\theta}_0 - \widehat{\theta}_1 X[i, 0])^2, $$ where \(Y[i]\) and \(X[i, 0]\) denote the response value and the value of the first feature at sample \(i\), respectively, and \((\widehat{\theta}_0, \widehat{\theta}_1)\) are ordinary least squares regression estimates when regressing \(Y[i]\) on \(X[i, 0]\) using the samples in \(I\).

Partial_quadratic #

Criteria based on fitting a quadratic function in the first predictor variable in each leaf. Formally, in a node consisting of samples \(I\), it is given by $$ \text{Partial_quadratic} = \tfrac{1}{|I|}\sum_{i \in I} (Y[i] - \widehat{\theta}_0 - \widehat{\theta}_1 X[i, 0] - \widehat{\theta}_2 X[i, 0]^2)^2, $$ where \(Y[i]\) and \(X[i, 0]\) denote the response value and the value of the first feature at sample \(i\), respectively, and \((\widehat{\theta}_0, \widehat{\theta}_1)\) are ordinary least squares regression estimates when regressing \(Y[i]\) on \(X[i, 0]\) using the samples in \(I\).

Squared_error #

Squared error based criteria, which can be used for regression and leads to standard CART splits. Formally, the squared error in a node consisting of samples \(I\), is given by $$ \text{Squared_error} = \tfrac{1}{|I|}\sum_{i\in I} \Big(Y[i] - \tfrac{1}{|I|}\sum_{i\in I} Y[i]\Big)^2, $$ where \(Y[i]\) denotes the response value at sample \(i\).

For a faster, but equivalent calculation, it is computed by $$ \text{Squared_error} = \tfrac{1}{|I|}\sum_{i\in I} Y[i]^2 - \Big(\tfrac{1}{|I|}\sum_{i\in I} Y[i]\Big)^2 $$