Skip to content

DecisionTree Class#

This is the class used to construct a decision tree. It uses the following four individual components to construct specific types of decision trees that can then be applied to data.

Instead of the user specifying all three components individually, it is also possible to only specify the tree_type, which then internally selects the corresponding default components for several established tree-algorithms, see user guide.

For more advanced modifications, it might be necessary to change how the splitting is performed. This can be done by passing a custom Splitter class.

The DecisionTree class and can be imported as follows:

from adaXT.decision_tree import DecisionTree

Attributes:

Name Type Description
max_depth int

The maximum depth of the tree.

tree_type str

The type of tree, either a string specifying a supported type (currently "Regression", "Classification", "Quantile" or "Gradient") or None.

leaf_nodes list[LeafNode]

A list of all leaf nodes in the tree.

root Node

The root node of the tree.

n_nodes int

The number of nodes in the tree.

n_features int

The number of features in the training data.

n_rows int

The number of rows (i.e., samples) in the training data.

Parameters:

Name Type Description Default
tree_type str | None

The type of tree, either a string specifying a supported type (currently "Regression", "Classification", "Quantile" or "Gradient") or None.

None
max_depth int

The maximum depth of the tree.

maxsize
impurity_tol float

The tolerance of impurity in a leaf node.

0
max_features int | float | Literal['sqrt', 'log2'] | None

The number of features to consider when looking for a split.

None
min_samples_split int

The minimum number of samples in a split.

1
min_samples_leaf int

The minimum number of samples in a leaf node.

1
min_improvement float

The minimum improvement gained from performing a split.

0
criteria Type[Criteria] | None

The Criteria class to use, if None it defaults to the tree_type default.

None
leaf_builder Type[LeafBuilder] | None

The LeafBuilder class to use, if None it defaults to the tree_type default.

None
predict Type[Predict] | None

The Predict class to use, if None it defaults to the tree_type default.

None
splitter Type[Splitter] | None

The Splitter class to use, if None it defaults to the default Splitter class.

None
skip_check_input bool

Skips any error checking on the features and response in the fitting function of a tree, should only be used if you know what you are doing, by default false.

False

fit #

fit(X, Y, sample_indices=None, sample_weight=None)

Fit the decision tree with training data (X, Y).

Parameters:

Name Type Description Default
X array-like object of dimension 2

The feature values used for training. Internally it will be converted to np.ndarray with dtype=np.float64.

required
Y array-like object of 1 or 2 dimensions

The response values used for training. Internally it will be converted to np.ndarray with dtype=np.float64.

required
sample_indices array-like object of dimension 1 | None

A vector specifying samples of the training data that should be used during training. If None all samples are used.

None
sample_weight array-like object of dimension 1 | None

Sample weights. May not be implemented for every criteria.

None

predict #

predict(X, **kwargs)

Predict response values at X using fitted decision tree. The behavior of this function is determined by the Prediction class used in the decision tree. For currently existing tree types the corresponding behavior is as follows:

Classification:

Returns the class with the highest proportion within the final leaf node.

Given predict_proba=True, it instead calculates the probability distribution.

Regression:

Returns the mean value of the response within the final leaf node.

Quantile:

Returns the conditional quantile of the response, where the quantile is specified by passing a list of quantiles via the quantile parameter.

Gradient:

Returns a matrix with columns corresponding to different orders of derivatives that can be provided via the 'orders' parameter. Default behavior is to compute orders 0, 1 and 2.

Parameters:

Name Type Description Default
X array-like object of dimension 2

New samples at which to predict the response. Internally it will be converted to np.ndarray with dtype=np.float64.

required

Returns:

Type Description
ndarray

(N, K) numpy array with the prediction, where K depends on the Prediction class and is generally 1

predict_leaf #

predict_leaf(X)

Computes a hash table indexing in which LeafNodes the rows of the provided X fall into.

Parameters:

Name Type Description Default
X array-like object of dimension 2

2-dimensional array for which the rows are the samples at which to predict.

required

Returns:

Type Description
dict

A hash table with keys corresponding to LeafNode ids and values corresponding to lists of indices of the rows that land in a given LeafNode.

predict_weights #

predict_weights(X=None, scale=True)

Predicts a weight matrix W, where W[i,j] indicates if X[i, :] and Xtrain[j, :] are in the same leaf node, where Xtrain denotes the training data. If scale is True, then the value is divided by the number of other training samples in the same leaf node.

Parameters:

Name Type Description Default
X ArrayLike | None

New samples to predict a weight (corresponding to columns in the output). If None then the training data is used as X.

None
scale bool

Whether to do row-wise scaling.

True

Returns:

Type Description
ndarray

A numpy array of shape MxN, where N denotes the number of rows of the original training data and M the number of rows of X.

refit_leaf_nodes #

refit_leaf_nodes(X, Y, sample_weight=None, sample_indices=None, **kwargs)

Refits the leaf nodes in a previously fitted decision tree.

More precisely, the method removes all leafnodes created on the initial fit and replaces them by predicting all samples in X that appear in sample_indices and placing them into new leaf nodes.

This method can be used to update the leaf nodes in a decision tree based on a new data while keeping the original splitting rules. If X does not contain the original training data the tree structure might change as leaf nodes without samples are collapsed. The method is also used to create honest splitting in RandomForests.

Parameters:

Name Type Description Default
X array-like object of dimension 2

The feature values used for training. Internally it will be converted to np.ndarray with dtype=np.float64.

required
Y array-like object of dimension 1 or 2

The response values used for training. Internally it will be converted to np.ndarray with dtype=np.float64.

required
sample_weight array-like object of dimension 1 | None

Sample weights. May not be implemented for all criteria.

None
sample_indices ArrayLike | None

Indices of X which to create new leaf nodes with.

None

similarity #

similarity(X0, X1)

Computes a similarity matrix W of size NxM, where each element W[i, j] is 1 if and only if X0[i, :] and X1[j, :] end up in the same leaf node.

Parameters:

Name Type Description Default
X0 ArrayLike

Array corresponding to rows of W in the output.

required
X1 ArrayLike

Array corresponding to columns of W in the output.

required

Returns:

Type Description
ndarray

A NxM shaped np.ndarray.