dabl.AnyClassifier

class dabl.AnyClassifier(n_jobs=None, force_exhaust_budget=True, verbose=0, type_hints=None, portfolio='baseline')[source]

Classifier with automatic model selection.

This model uses successive halving on a portfolio of complex models (HistGradientBoosting, RandomForest, SVC, LogisticRegression) to pick the best model family and hyper-parameters.

AnyClassifier internally applies EasyPreprocessor, so no preprocessing is necessary.

Parameters
n_jobsint, default=None

Number of processes to spawn for parallelizing the search.

force_exhaust_budgetbool, default=True

Whether to ensure at least one model is trained on the full dataset in successive halving. See the documentation of successive halving for details.

verboseinteger, default=0

Verbosity. Higher means more output.

type_hintsdict or None

If dict, provide type information for columns. Keys are column names, values are types as provided by detect_types.

portfoliostr, default=’baseline’

Lets you choose a portfolio. Choose ‘baseline’ for multiple classifiers with default parameters, ‘hgb’ for high-performing HistGradientBoostingClassifiers, ‘svc’ for high-performing support vector classifiers, ‘rf’ for high-performing random forest classifiers, ‘lr’ for high-performing logistic regression classifiers, ‘mixed’ for a portfolio of different high-performing classifiers.

Attributes
search_SuccessiveHalving instance

Fitted GridSuccessiveHalving instance for inspection.

est_sklearn estimator

Best estimator (pipeline) found during search.

__init__(n_jobs=None, force_exhaust_budget=True, verbose=0, type_hints=None, portfolio='baseline')[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None, *, target_col=None)[source]

Fit estimator.

Requires to either specify the target as separate 1d array or Series y (in scikit-learn fashion) or as column of the dataframe X specified by target_col. If y is specified, X is assumed not to contain the target.

Parameters
XDataFrame

Input features. If target_col is specified, X also includes the target.

ySeries or numpy array, optional.

Target. You need to specify either y or target_col.

target_colstring or int, optional

Column name of target if included in X.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfobject

Estimator instance.