Model wrapper for sklearn cross_val_score

xan Published at Dev

xan

This is an minimal example using XGBClassifier, but am interested how this would work in general. I am trying to wrap the model class in order to use it in cross validation. In this case I am only weighing the imbalanced classes, but my ultimate goal is a bit broader change in the pipeline.

My first try was to simply override the fit function:

from sklearn import metrics
from sklearn.utils.class_weight import compute_sample_weight
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.base import BaseEstimator, ClassifierMixin

class WeightedXGBClassifier(XGBClassifier, BaseEstimator, ClassifierMixin):
    
    @staticmethod
    def get_weights(y):
        sample_weights = compute_sample_weight(class_weight='balanced', y=y)
        return sample_weights
    
    def fit(self, X, y, **kwargs):
        weights = self.get_weights(y)
        super(XGBClassifier, self).fit(X, y, sample_weight=weights, **kwargs)

which works fine, when I'm trying to fit the model, use predictions etc.. But using this in sklearn cross_val_score

xgb_model_cv = WeightedXGBClassifier(n_estimators=100, max_depth=4, alpha=100, use_label_encoder=False)

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
auc_scorer = metrics.make_scorer(metrics.roc_auc_score, needs_proba=True)
scores = cross_val_score(xgb_model_cv, X, y, scoring=auc_scorer, cv=cv, n_jobs=-1, verbose=1)

throws an error

File "/home/ubuntu/anaconda3/envs/pyTF/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/ubuntu/anaconda3/envs/pyTF/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 106, in __call__
    score = scorer._score(cached_call, estimator, *args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pyTF/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 306, in _score
    y_pred = self._select_proba_binary(y_pred, clf.classes_)
AttributeError: 'WeightedXGBClassifier' object has no attribute 'classes_'

Now, it is my understanding the classes_ attribute is created, when the model is fitted, but I am not sure how to then properly wrap the model to capture this. Note that running

model = XGBClassifier(use_label_encoder=False, scale_pos_weight=(~y).sum()/y.sum())
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
scores = cross_val_score(model, X, y, scoring='roc_auc', cv=cv, n_jobs=-1)

works fine. My second try was:

class XGBClassifierWrapper(BaseEstimator, ClassifierMixin):
    def __init__(self, **kwargs):
#         super(BaseEstimator).__init__()
#         super(ClassifierMixin).__init__()
        self.xgb_classifier_obj = XGBClassifier(**kwargs)
    
    @staticmethod
    def get_weights(y):
        sample_weights = compute_sample_weight(class_weight='balanced', y=y)
        return sample_weights
    
    def fit(self, X, y, **kwargs):
        weights = self.get_weights(y)
        self.xgb_classifier_obj.fit(X, y, sample_weight=weights, **kwargs)
        return self
    
    def predict(self, X, **kwargs):
        return self.xgb_classifier_obj.predict(X, **kwargs)
    
    def predict_proba(self, X, **kwargs):
        return self.xgb_classifier_obj.predict_proba(X, **kwargs)

which again resulted in the same error as in the case above, i.e., missing classes_ attribute.

Ben Reiniger

(I don't actually get an error when I run any of your code; however, I do get a scores consisting only of nan, and adding error_score='raise' I get your error message.)

In the first approach, I believe the only real problem is in your initialization. super(XGBClassifier, self): that's looking for a parent class of XGBClassifier, and not XGBClassifier itself, as I assume you want. Replacing with just the vanilla super() and everything works.

You should also add return self to the end of fit in your first attempt, but it's not important here. You can probably safely drop BaseEstimator and ClassifierMixin from the inheritance, since XGBClassifier already inherits from them.

Your second, wrapper, approach just fails because the wrapped xgb_classifier_obj has all the fitted attributes, including classes_, but your wrapper doesn't expose that directly. You can just set self.classes_ = self.xgb_classifier_obj.classes_ in fit, or perhaps define a @property delegation.

You should also consider that your __init__ this time doesn't meet the sklearn API, so cloning won't work correctly. I'd advise using the first approach for this reason (fixing it requires rather more tedious work, in my opinion).

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2022-10-5

Comments

0 comments

TOP Ranking

Article

Model wrapper for sklearn cross_val_score

Model wrapper for sklearn cross_val_score

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

pump.io port in URL

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

How to import an asset in swift using Bundle.main.path() in a react-native native module

How to use HttpClient with ANY ssl cert, no matter how "bad" it is

Modbus Python Schneider PM5300

What is the exact difference between “ use_all_dns_ips” and "resolve_canonical_bootstrap_servers_only” in client.dns.lookup options?

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

BigQuery - concatenate ignoring NULL

split column by delimiter and deleting expanded column

Unable to use switch toggle for dark mode in material-ui

Soundcloud API Authentication | NodeWebkit, redirect uri and local file system

Apache rewrite or susbstitute rule for bugzilla HTTP 301 redirect

Is there an option for a Simulink Scope to display the layout in single column?

UWP access denied

Center buttons and brand in Bootstrap

express js can't redirect user

Make a B+ Tree concurrent thread safe

Printing Int array and String array in one

Google Chrome Translate Page Does Not Work

Elasticsearch - How to match number range in string