Release History¶

dabl 0.2.5¶

Added pruning of correlated features and detecting of interactions with categorical features to regression plots, #316 by @amueller.
Add detection of lists and dicts in detect_type_series. #317 by @amueller.
Use matplotlib’s stairs for faster histograms. #313 by @amueller.
Add jitter to ordinal features in regression scatter plots. :issue:`` by @amueller.

Added plot_sankey, #305 by @amueller for Sankey plots (or really alluvial flow diagrams).
Drop outliers in univariate target plots in regression. #304 by @amueller.

Rely on the Successive Halving implementation from scikit-learn 0.24, removing the old implementation. Consequently the search module in dabl has been deprecated and the minimum version requirement of scikit-learn is now 0.24.
The type detection has been completely rewritten and accomodates more edge cases, #270 by @amueller.
A global configuration was introduced that can be set with set_config. For now, this allows users to turn off truncation of labels, by @amueller.
Fix default value of alpha in plot_regression_continuous, #276 by @amueller.
Fix a memory issue when calling bincount on really large integers in the type detection, #275 by @amueller.

Fix bug in type detection when a column contained boolean data and missing values, #256 by @amueller.
Bundle LICENSE file with project in release, #253 by @dhirschfeld.
Make color usage consistent between scatter plots and mosaic plots, #249 by @h4pZ.
Update the AnyClassifier portfolio to include several new optimized portfolios, #246 by @hp2500.

Ensure target column is not dropped in ‘clean’ for highly imbalanced datasets #171.
Scale histograms separately in class histograms #173.
Shorten really long column names to fix figure layout #180.
Add shuffling to cross-validation for simple models #185.
Fix broken legend for class histograms for ordinal variables #189.
Allow numpy arrays in SimpleRegressor and plot #187.
Add actual vs predicted plot for regression to explain #186.

More robust detection of dirty floats, more robust parsing of categorical variables.
Ensure data is parsed consistently between predict and fit by not calling clean in fit.
Allow passing columns with integer names as target in plot.