scikit learnを使用したマルチクラスケースの精度、再現率、精度、およびf1-scoreを計算する方法は？

new_with_python：

データが次のように見える感情分析問題で作業しています。

label instances
    5    1190
    4     838
    3     239
    1     204
    2     127

したがって、1190はでinstancesラベル付けされて5いるため、私のデータは不均衡です。分類については、scikitのSVCを使用しています。問題は、マルチクラスの場合の精度、再現率、精度、およびf1-scoreを正確に計算するために、データを正しい方法でバランスを取る方法がわからないことです。だから私は以下のアプローチを試しました：

最初：

    wclf = SVC(kernel='linear', C= 1, class_weight={1: 10})
    wclf.fit(X, y)
    weighted_prediction = wclf.predict(X_test)

print 'Accuracy:', accuracy_score(y_test, weighted_prediction)
print 'F1 score:', f1_score(y_test, weighted_prediction,average='weighted')
print 'Recall:', recall_score(y_test, weighted_prediction,
                              average='weighted')
print 'Precision:', precision_score(y_test, weighted_prediction,
                                    average='weighted')
print '\n clasification report:\n', classification_report(y_test, weighted_prediction)
print '\n confussion matrix:\n',confusion_matrix(y_test, weighted_prediction)

第二：

auto_wclf = SVC(kernel='linear', C= 1, class_weight='auto')
auto_wclf.fit(X, y)
auto_weighted_prediction = auto_wclf.predict(X_test)

print 'Accuracy:', accuracy_score(y_test, auto_weighted_prediction)

print 'F1 score:', f1_score(y_test, auto_weighted_prediction,
                            average='weighted')

print 'Recall:', recall_score(y_test, auto_weighted_prediction,
                              average='weighted')

print 'Precision:', precision_score(y_test, auto_weighted_prediction,
                                    average='weighted')

print '\n clasification report:\n', classification_report(y_test,auto_weighted_prediction)

print '\n confussion matrix:\n',confusion_matrix(y_test, auto_weighted_prediction)

第三：

clf = SVC(kernel='linear', C= 1)
clf.fit(X, y)
prediction = clf.predict(X_test)


from sklearn.metrics import precision_score, \
    recall_score, confusion_matrix, classification_report, \
    accuracy_score, f1_score

print 'Accuracy:', accuracy_score(y_test, prediction)
print 'F1 score:', f1_score(y_test, prediction)
print 'Recall:', recall_score(y_test, prediction)
print 'Precision:', precision_score(y_test, prediction)
print '\n clasification report:\n', classification_report(y_test,prediction)
print '\n confussion matrix:\n',confusion_matrix(y_test, prediction)


F1 score:/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:676: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
  sample_weight=sample_weight)
/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1172: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
  sample_weight=sample_weight)
/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1082: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
  sample_weight=sample_weight)
 0.930416613529

しかし、私はこのような警告を受けます：

/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1172:
DeprecationWarning: The default `weighted` averaging is deprecated,
and from version 0.18, use of precision, recall or F-score with 
multiclass or multilabel data or pos_label=None will result in an 
exception. Please set an explicit value for `average`, one of (None, 
'micro', 'macro', 'weighted', 'samples'). In cross validation use, for 
instance, scoring="f1_weighted" instead of scoring="f1"

分類子のメトリックを正しい方法で計算するために、バランスの取れていないデータを正しく処理するにはどうすればよいですか？

読み取り：

どのウェイトが何に使用されているかについては、多くの混乱があると思います。何があなたを悩ませているのか正確にわかっていないので、私はさまざまなトピックをカバーするつもりです、私と一緒に耐えてください;）。

クラスの重み

class_weightパラメータからの重みは、分類器をトレーニングするために使用されます。これらは、使用しているメトリックの計算には使用されません。クラスの重みが異なると、分類子が異なるため、数値は異なります。

基本的に、すべてのscikit-learn分類子では、クラスの重みを使用して、クラスがどれほど重要であるかをモデルに伝えます。つまり、トレーニング中に、分類子は重みの高いクラスを適切に分類するために特別な努力をします。
その方法はアルゴリズムによって異なります。SVCでどのように機能するかの詳細が必要で、ドキュメントが意味をなさない場合は、お気軽にお知らせください。

メトリック

分類子を取得したら、それがどれだけうまく機能しているかを知りたいと思います。ここaccuracyでrecall_score、言及したメトリックを使用できます：、、f1_score...

通常、クラス分布が不均衡な場合、最も頻度の高いクラスを予測するだけのモデルに高いスコアを与えるため、精度は悪い選択と見なされます。

これらのすべてのメトリックについては詳しく説明しませんが、を除いてaccuracy、それらはクラスレベルで自然に適用されることに注意してくださいprint。分類レポートのこれを見るとわかるように、クラスごとに定義されています。彼らは、どのクラスがポジティブであるtrue positivesかfalse negativeを定義することを必要とするなどの概念に依存しています。

             precision    recall  f1-score   support

          0       0.65      1.00      0.79        17
          1       0.57      0.75      0.65        16
          2       0.33      0.06      0.10        17
avg / total       0.52      0.60      0.51        50

警告

F1 score:/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:676: DeprecationWarning: The 
default `weighted` averaging is deprecated, and from version 0.18, 
use of precision, recall or F-score with multiclass or multilabel data  
or pos_label=None will result in an exception. Please set an explicit 
value for `average`, one of (None, 'micro', 'macro', 'weighted', 
'samples'). In cross validation use, for instance, 
scoring="f1_weighted" instead of scoring="f1".

計算方法を定義せずにf1-score、recall、precisionを使用しているため、この警告が表示されます！質問は言い換えることができます：上記の分類レポートから、f1-scoreに対して1つのグローバル番号をどのように出力しますか？あなたは出来る：

各クラスのf1スコアの平均をとりavg / totalます。これが上記の結果です。これは、マクロ平均化とも呼ばれます。
真陽性/偽陰性などのグローバルカウントを使用してf1-scoreを計算します（各クラスの真陽性/偽陰性の数を合計します）。別名マイクロ平均化。
f1スコアの加重平均を計算します。'weighted'scikit-learnでを使用すると、クラスのサポートによってf1-scoreが比較されます。クラスが持つ要素が多いほど、計算でこのクラスのf1-scoreが重要になります。

これらはscikit-learnの3つのオプションです。警告の1つを選択する必要があることを示しています。したがってaverage、scoreメソッドの引数を指定する必要があります。

どちらを選択するかは、分類子のパフォーマンスの測定方法次第です。たとえば、マクロ平均化ではクラスの不均衡は考慮されず、クラス1のf1スコアはクラスのf1スコアと同じくらい重要になります5.加重平均を使用すると、クラス5の重要性が高まります。

これらのメトリックの引数の仕様全体は、現在scikit-learnでは明確ではありません。ドキュメントによると、バージョン0.18で改善されます。彼らはいくつかの非自明な標準動作を削除しており、開発者がそれに気付くように警告を出しています。

スコアの計算

最後に触れておきたいのは（気づいていればスキップしても構いません）スコアは、分類子が見たことのないデータで計算された場合にのみ意味があるということです。分類器の適合に使用されたデータで取得したスコアはまったく無関係であるため、これは非常に重要です。

StratifiedShuffleSplitこれはを使用して行う方法です。これにより、（シャッフル後の）データのランダムな分割が可能になり、ラベルの分布が維持されます。

from sklearn.datasets import make_classification
from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix

# We use a utility to generate artificial classification data.
X, y = make_classification(n_samples=100, n_informative=10, n_classes=3)
sss = StratifiedShuffleSplit(y, n_iter=1, test_size=0.5, random_state=0)
for train_idx, test_idx in sss:
    X_train, X_test, y_train, y_test = X[train_idx], X[test_idx], y[train_idx], y[test_idx]
    svc.fit(X_train, y_train)
    y_pred = svc.predict(X_test)
    print(f1_score(y_test, y_pred, average="macro"))
    print(precision_score(y_test, y_pred, average="macro"))
    print(recall_score(y_test, y_pred, average="macro"))

お役に立てれば。

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2020-07-7

コメントを追加

サインイン

TOP 一覧

記事