模型的构建和评估

特征选择、特色工程、模型选择、超参数优化、交叉验证、预测性能评估和分类准确性比较检验

在构建高质量预测分类模型时，选择正确的特征（或预测变量）并调整超参数（未估计的模型参数）非常重要。

特征选择和超参数调整可能会产生多个模型。您可以比较模型之间的 k 折分类错误率、受试者工作特征 (ROC) 曲线或混淆矩阵。还可以进行统计检验，以检测一个分类模型是否明显优于另一个。

要在训练分类模型之前对新函数进行工程处理，请使用 gencfeatures。

要以交互方式构建和评估分类模型，可以使用分类学习器。

要自动选择具有调整后的超参数的模型，请使用 fitcauto。此函数尝试选择具有不同超参数值的分类模型类型，并返回预期在新数据上表现良好的最终模型。当您不确定哪些分类器类型最适合您的数据时，请使用 fitcauto。

要调整特定模型的超参数，请选择超参数值并使用这些值对模型进行交叉验证。例如，要调整 SVM 模型，可以选择一组框约束和核尺度，然后使用每对值对模型进行交叉验证。某些 Statistics and Machine Learning Toolbox™ 分类函数通过贝叶斯优化、网格搜索或随机搜索提供自动超参数调整。实现贝叶斯优化的主函数 bayesopt 对于许多其他应用来说也足够灵活。请参阅Bayesian Optimization Workflow。

要解释分类模型，您可以使用 lime、shapley 和 plotPartialDependence。

App

分类学习器

使用有监督的机器学习训练模型以对数据进行分类

函数

全部展开

特征选择

`fscchi2`	Univariate feature ranking for classification using chi-square tests (自 R2020a 起)
`fscmrmr`	Rank features for classification using minimum redundancy maximum relevance (MRMR) algorithm (自 R2019b 起)
`fscnca`	Feature selection using neighborhood component analysis for classification
`oobPermutedPredictorImportance`	Out-of-bag predictor importance estimates for random forest of classification trees by permutation
`permutationImportance`	Predictor importance by permutation (自 R2024a 起)
`predictorImportance`	Estimates of predictor importance for classification tree
`predictorImportance`	Estimates of predictor importance for classification ensemble of decision trees
`relieff`	Rank importance of predictors using ReliefF or RReliefF algorithm
`selectFeatures`	Select important features for NCA classification or regression (自 R2023b 起)
`sequentialfs`	Sequential feature selection using custom criterion

特征工程

`gencfeatures`	Perform automated feature engineering for classification (自 R2021a 起)
`describe`	Describe generated features (自 R2021a 起)
`transform`	Transform new data using generated features (自 R2021a 起)

自动模型选择

fitcauto Automatically select classification model with optimized hyperparameters (自 R2020a 起)

超参数优化

`bayesopt`	Select optimal machine learning hyperparameters using Bayesian optimization
`hyperparameters`	Variable descriptions for optimizing a fit function
`optimizableVariable`	Variable description for `bayesopt` or other optimizers

交叉验证

`crossval`	Estimate loss using cross-validation
`cvpartition`	Partition data for cross-validation
`repartition`	Repartition data for cross-validation
`test`	Test indices for cross-validation
`training`	Training indices for cross-validation

模型解释

与模型无关的局部可解释性解释 (LIME)

`lime`	Local interpretable model-agnostic explanations (LIME) (自 R2020b 起)
`fit`	Fit simple model of local interpretable model-agnostic explanations (LIME) (自 R2020b 起)
`plot`	Plot results of local interpretable model-agnostic explanations (LIME) (自 R2020b 起)

夏普利值

`shapley`	Shapley values (自 R2021a 起)
`fit`	Compute Shapley values for query points (自 R2021a 起)
`plot`	Plot Shapley values using bar graphs (自 R2021a 起)
`boxchart`	Visualize Shapley values using box charts (box plots) (自 R2024a 起)
`swarmchart`	Visualize Shapley values using swarm scatter charts (自 R2024a 起)

部分依赖

`partialDependence`	Compute partial dependence (自 R2020b 起)
`plotPartialDependence`	Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots

分类性能计算

混淆矩阵

`confusionchart`	Create confusion matrix chart for classification problem
`confusionmat`	Compute confusion matrix for classification problem

受试者工作特征 (ROC) 曲线

`rocmetrics`	Receiver operating characteristic (ROC) curve and performance metrics for binary and multiclass classifiers (自 R2022a 起)
`addMetrics`	Compute additional classification performance metrics (自 R2022a 起)
`average`	Compute performance metrics for average receiver operating characteristic (ROC) curve in multiclass problem (自 R2022a 起)
`plot`	Plot receiver operating characteristic (ROC) curves and other performance curves (自 R2022a 起)
`perfcurve`	Receiver operating characteristic (ROC) curve or other performance curve for classifier output

模型准确度比较检验

`testcholdout`	Compare predictive accuracies of two classification models
`testckfold`	Compare accuracies of two classification models by repeated cross-validation

对象

全部展开

特征选择

FeatureSelectionNCAClassification Feature selection for classification using neighborhood component analysis (NCA)

特征工程

FeatureTransformer Generated feature transformations (自 R2021a 起)

超参数优化

BayesianOptimization Bayesian optimization results

属性

ConfusionMatrixChart Properties	Confusion matrix chart appearance and behavior
ROCCurve Properties	Receiver operating characteristic (ROC) curve appearance and behavior (自 R2022a 起)

主题

分类学习器

Train Classification Models in Classification Learner App
Workflow for training, comparing and improving classification models, including automated, manual, and parallel training.
Visualize and Assess Classifier Performance in Classification Learner
Compare model accuracy values, visualize results by plotting class predictions, and check performance per class in the confusion matrix.
Feature Selection and Feature Transformation Using Classification Learner App
Identify useful predictors using plots or feature ranking algorithms, select features to include, and transform features using PCA in Classification Learner.

特征选择

Introduction to Feature Selection
Learn about feature selection algorithms and explore the functions available for feature selection.
Sequential Feature Selection
This topic introduces sequential feature selection and provides an example that selects features sequentially using a custom criterion and the sequentialfs function.
Neighborhood Component Analysis (NCA) Feature Selection
Neighborhood component analysis (NCA) is a non-parametric method for selecting features with the goal of maximizing prediction accuracy of regression and classification algorithms.
Tune Regularization Parameter to Detect Features Using NCA for Classification
This example shows how to tune the regularization parameter in fscnca using cross-validation.
Regularize Discriminant Analysis Classifier
Make a more robust and simpler model by removing predictors without compromising the predictive power of the model.
选择用于高维数据分类的特征
此示例说明如何选择用于高维数据分类的特征。具体而言，示例说明如何执行序列特征选择，这是最常用的特征选择算法之一。示例还说明如何使用留出法和交叉验证来评估所选特征的分类性能。

特征工程

Automated Feature Engineering for Classification
Use gencfeatures to engineer new features before training a classification model. Before making predictions on new data, apply the same feature transformations to the new data set.

自动模型选择

Automated Classifier Selection with Bayesian and ASHA Optimization
Use fitcauto to automatically try a selection of classification model types with different hyperparameter values, given training predictor and response data.

超参数优化

Bayesian Optimization Workflow
Perform Bayesian optimization using a fit function or by calling bayesopt directly.
Variables for a Bayesian Optimization
Create variables for Bayesian optimization.
Bayesian Optimization Objective Functions
Create the objective function for Bayesian optimization.
Constraints in Bayesian Optimization
Set different types of constraints for Bayesian optimization.
Optimize Cross-Validated Classifier Using bayesopt
Minimize cross-validation loss using Bayesian Optimization.
使用贝叶斯优化来优化分类器拟合
在拟合函数中使用 OptimizeParameters 名称-值参量最小化交叉验证损失。
Bayesian Optimization Plot Functions
Visually monitor a Bayesian optimization.
Bayesian Optimization Output Functions
Monitor a Bayesian optimization.
Bayesian Optimization Algorithm
Understand the underlying algorithms for Bayesian optimization.
Parallel Bayesian Optimization
How Bayesian optimization works in parallel.

模型解释

Interpret Machine Learning Models
Explain model predictions using the lime and shapley objects and the plotPartialDependence function.
Shapley Values for Machine Learning Model
Compute Shapley values for a machine learning model using interventional algorithm or conditional algorithm.
Shapley Output Functions
Stop Shapley computations, create plots, save information to your workspace, or perform calculations while using shapley.

交叉验证

Implement Cross-Validation Using Parallel Computing
Speed up cross-validation using parallel computing.

分类性能计算

ROC Curve and Performance Metrics
Use rocmetrics to examine the performance of a classification algorithm on a test data set.
Performance Curves by perfcurve
Learn how the perfcurve function computes a receiver operating characteristic (ROC) curve.

模型的构建和评估

App

函数

特征选择

特征工程

自动模型选择

超参数优化

交叉验证

模型解释

与模型无关的局部可解释性解释 (LIME)

夏普利值

部分依赖

分类性能计算

混淆矩阵

受试者工作特征 (ROC) 曲线

模型准确度比较检验

对象

特征选择

特征工程

超参数优化

属性

主题

分类学习器

特征选择

特征工程

自动模型选择

超参数优化

模型解释

交叉验证

分类性能计算

WeChat