dabl.plot
.plot_classification_continuous¶
- dabl.plot.plot_classification_continuous(X, *, target_col, types=None, hue_order=None, scatter_alpha='auto', scatter_size='auto', univariate_plot='histogram', drop_outliers=True, plot_pairwise=True, top_k_interactions=10, random_state=None, **kwargs)[source]¶
Plots for continuous features in classification.
Selects important continuous features according to F statistics. Creates univariate distribution plots for these, as well as scatterplots for selected pairs of features, and scatterplots for selected pairs of PCA directions. If there are more than 2 classes, scatter plots from Linear Discriminant Analysis are also shown. Scatter plots are determined “interesting” is a decision tree on the two-dimensional projection performs well. The cross-validated macro-average recall of a decision tree is shown in the title for each scatterplot.
- Parameters:
- Xdataframe
Input data including features and target.
- target_colstr or int
Identifier of the target column in X.
- typesdataframe of types, optional.
Output of detect_types on X. Can be used to avoid recomputing the types.
- scatter_alphafloat, default=’auto’
Alpha values for scatter plots. ‘auto’ is dirty hacks.
- scatter_sizefloat, default=’auto’
Marker size for scatter plots. ‘auto’ is dirty hacks.
- univariate_plotstring, default=”histogram”
Supported: ‘histogram’ and ‘kde’.
- drop_outliersbool, default=True
Whether to drop outliers when plotting.
- plot_pairwisebool, default=True
Whether to create pairwise plots. Can be a bit slow.
- top_k_interactionsint, default=10
How many pairwise interactions to consider (ranked by univariate f scores). Runtime is quadratic in this, but higher numbers might find more interesting interactions.
- random_stateint, None or numpy RandomState
Random state used for subsampling for determining pairwise features to show.
Notes
important kwargs parameters are: scatter_size and scatter_alpha.