如何使用scikit learning计算多类案例的精度，召回率，准确性和f1-得分？

221

new_with_python：

我正在研究情绪分析问题，数据看起来像这样：

label instances
    5    1190
    4     838
    3     239
    1     204
    2     127

所以，我的数据是不平衡的，因为1190 instances标有5。对于使用scikit的SVC进行的分类Im 。问题是我不知道如何以正确的方式平衡我的数据，以便准确地计算多类案例的精度，召回率，准确性和f1得分。因此，我尝试了以下方法：

第一：

    wclf = SVC(kernel='linear', C= 1, class_weight={1: 10})
    wclf.fit(X, y)
    weighted_prediction = wclf.predict(X_test)

print 'Accuracy:', accuracy_score(y_test, weighted_prediction)
print 'F1 score:', f1_score(y_test, weighted_prediction,average='weighted')
print 'Recall:', recall_score(y_test, weighted_prediction,
                              average='weighted')
print 'Precision:', precision_score(y_test, weighted_prediction,
                                    average='weighted')
print '\n clasification report:\n', classification_report(y_test, weighted_prediction)
print '\n confussion matrix:\n',confusion_matrix(y_test, weighted_prediction)

第二：

auto_wclf = SVC(kernel='linear', C= 1, class_weight='auto')
auto_wclf.fit(X, y)
auto_weighted_prediction = auto_wclf.predict(X_test)

print 'Accuracy:', accuracy_score(y_test, auto_weighted_prediction)

print 'F1 score:', f1_score(y_test, auto_weighted_prediction,
                            average='weighted')

print 'Recall:', recall_score(y_test, auto_weighted_prediction,
                              average='weighted')

print 'Precision:', precision_score(y_test, auto_weighted_prediction,
                                    average='weighted')

print '\n clasification report:\n', classification_report(y_test,auto_weighted_prediction)

print '\n confussion matrix:\n',confusion_matrix(y_test, auto_weighted_prediction)

第三：

clf = SVC(kernel='linear', C= 1)
clf.fit(X, y)
prediction = clf.predict(X_test)


from sklearn.metrics import precision_score, \
    recall_score, confusion_matrix, classification_report, \
    accuracy_score, f1_score

print 'Accuracy:', accuracy_score(y_test, prediction)
print 'F1 score:', f1_score(y_test, prediction)
print 'Recall:', recall_score(y_test, prediction)
print 'Precision:', precision_score(y_test, prediction)
print '\n clasification report:\n', classification_report(y_test,prediction)
print '\n confussion matrix:\n',confusion_matrix(y_test, prediction)


F1 score:/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:676: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
  sample_weight=sample_weight)
/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1172: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
  sample_weight=sample_weight)
/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1082: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
  sample_weight=sample_weight)
 0.930416613529

但是，我收到这样的警告：

/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1172:
DeprecationWarning: The default `weighted` averaging is deprecated,
and from version 0.18, use of precision, recall or F-score with 
multiclass or multilabel data or pos_label=None will result in an 
exception. Please set an explicit value for `average`, one of (None, 
'micro', 'macro', 'weighted', 'samples'). In cross validation use, for 
instance, scoring="f1_weighted" instead of scoring="f1"

如何正确处理我的不平衡数据，以便以正确的方式计算分类器的指标？

读取：

我认为对于将哪些砝码用于什么有很多困惑。我不确定我是否确切知道让您感到困扰，所以我将涉及不同的话题，请耐心等待;）。

班级重量

来自class_weight参数的权重用于训练分类器。它们不会用于计算您正在使用的任何指标：使用不同的类别权重，数字会有所不同，仅仅是因为分类器不同。

基本上，在每个scikit-learn分类器中，类权重都用于告诉您的模型，类的重要性。这意味着在训练过程中，分类器将付出更多的努力来对权重较高的类进行正确分类。
他们如何做到的是特定于算法的。如果您想了解有关SVC如何工作的详细信息，而该文档对您来说没有意义，请随时提及。

指标

有了分类器后，您想知道其效果如何。在这里，你可以使用你所提到的指标：accuracy，recall_score，f1_score...

通常，当班级分布不平衡时，准确性被认为是较差的选择，因为它会给只预测最频繁班级的模型打高分。

我将不详细介绍所有这些指标，但是请注意，除之外accuracy，它们自然应用于类级别：如您在print分类报告中所见，它们是为每个类定义的。他们依赖诸如true positives或的概念，这些概念false negative要求定义哪个类别是肯定的。

             precision    recall  f1-score   support

          0       0.65      1.00      0.79        17
          1       0.57      0.75      0.65        16
          2       0.33      0.06      0.10        17
avg / total       0.52      0.60      0.51        50

警告

F1 score:/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:676: DeprecationWarning: The 
default `weighted` averaging is deprecated, and from version 0.18, 
use of precision, recall or F-score with multiclass or multilabel data  
or pos_label=None will result in an exception. Please set an explicit 
value for `average`, one of (None, 'micro', 'macro', 'weighted', 
'samples'). In cross validation use, for instance, 
scoring="f1_weighted" instead of scoring="f1".

之所以收到此警告，是因为您使用的是f1分数，召回率和精确度，而未定义应如何计算它们！问题可以改写为：从以上分类报告中，您如何为f1分数输出一个全局数字？你可以：

取每个班级的f1分数的平均值：这就是avg / total上面的结果。也称为宏平均。
使用真实阳性/阴性阴性等的总计数来计算f1-分数（您将每个类别的真实阳性/阴性阴性的总数相加）。又名微平均。
计算f1分数的加权平均值。使用'weighted'在scikit学习会由支持类的权衡F1评分：越要素类有，更重要的F1的得分这个类在计算中。

这是scikit-learn中的3个选项，警告是说您必须选择一个。因此，您必须average为score方法指定一个参数。

选择哪种方法取决于您如何衡量分类器的性能：例如，宏平均不考虑类的不平衡，并且类1的f1分数与类的f1分数一样重要5.但是，如果您使用加权平均，则对于第5类，您将变得更加重要。

这些指标中的整个参数规范目前在scikit-learn中尚不十分清楚，根据文档，它将在0.18版中变得更好。他们正在删除一些不太明显的标准行为，并发出警告，以便开发人员注意到它。

计算分数

我要提到的最后一件事（如果您知道它，可以跳过它）是，分数只有在基于分类器从未见过的数据进行计算时才有意义。这是非常重要的，因为您获得的用于拟合分类器的数据得分都是完全不相关的。

这是使用的一种方法StratifiedShuffleSplit，它可以随机分配数据（经过改组后），以保留标签的分布。

from sklearn.datasets import make_classification
from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix

# We use a utility to generate artificial classification data.
X, y = make_classification(n_samples=100, n_informative=10, n_classes=3)
sss = StratifiedShuffleSplit(y, n_iter=1, test_size=0.5, random_state=0)
for train_idx, test_idx in sss:
    X_train, X_test, y_train, y_test = X[train_idx], X[test_idx], y[train_idx], y[test_idx]
    svc.fit(X_train, y_train)
    y_pred = svc.predict(X_test)
    print(f1_score(y_test, y_pred, average="macro"))
    print(precision_score(y_test, y_pred, average="macro"))
    print(recall_score(y_test, y_pred, average="macro"))

希望这可以帮助。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-07-7

我来说两句

0 条评论

登录后参与评论

TOP 榜单

文章