`dabl.plot`.plot_classification_continuous¶

dabl.plot.plot_classification_continuous(X, target_col, types=None, hue_order=None, scatter_alpha='auto', scatter_size='auto', univariate_plot='histogram', drop_outliers=True, plot_pairwise=True, top_k_interactions=10, random_state=None, **kwargs)[source]¶

Plots for continuous features in classification.

Selects important continuous features according to F statistics. Creates univariate distribution plots for these, as well as scatterplots for selected pairs of features, and scatterplots for selected pairs of PCA directions. If there are more than 2 classes, scatter plots from Linear Discriminant Analysis are also shown. Scatter plots are determined “interesting” is a decision tree on the two-dimensional projection performs well. The cross-validated macro-average recall of a decision tree is shown in the title for each scatterplot.

Parameters

Xdataframe: Input data including features and target.
target_colstr or int: Identifier of the target column in X.
typesdataframe of types, optional.: Output of detect_types on X. Can be used to avoid recomputing the types.
scatter_alphafloat, default=’auto’: Alpha values for scatter plots. ‘auto’ is dirty hacks.
scatter_sizefloat, default=’auto’: Marker size for scatter plots. ‘auto’ is dirty hacks.
univariate_plotstring, default=”histogram”: Supported: ‘histogram’ and ‘kde’.
drop_outliersbool, default=True: Whether to drop outliers when plotting.
plot_pairwisebool, default=True: Whether to create pairwise plots. Can be a bit slow.
top_k_interactionsint, default=10: How many pairwise interactions to consider (ranked by univariate f scores). Runtime is quadratic in this, but higher numbers might find more interesting interactions.
random_stateint, None or numpy RandomState: Random state used for subsampling for determining pairwise features to show.

Notes

important kwargs parameters are: scatter_size and scatter_alpha.

dabl.plot.plot_classification_continuous¶

`dabl.plot`.plot_classification_continuous¶