Python熊猫使用自定义agg函数通过groupby创建新列

拉登科夫·弗拉迪斯拉夫

我的数据框：

from random import random, randint
from pandas import DataFrame

t = DataFrame({"metasearch":["A","B","A","B","A","B","A","B"],
                   "market":["A","B","A","B","A","B","A","B"],
                   "bid":[random() for i in range(8)],
                   "clicks": [randint(0,10) for i in range(8)],
                   "country_code":["A","A","A","A","A","B","A","B"]})

我想为每个设置LinearRegression market，所以我：

1）组df- groups = t.groupby(by="market")

2）准备要适合模型的功能-

from sklearn.linear_model import LinearRegression
def group_fitter(group):
    lr = LinearRegression()
    X = group["bid"].fillna(0).values.reshape(-1,1)
    y = group["clicks"].fillna(0)
    lr.fit(X, y)
    return lr.coef_[0] # THIS IS A SCALAR

3）创建一个新的Series，market其索引和coef值为：

s = groups.transform(group_fitter)

但是第3步失败： KeyError ：（“ bid_cpc”，“在出价时发生”）

耶斯列尔

我认为您需要transform使用，apply因为在函数中同时使用更多的列，并且需要使用新的列join：

from sklearn.linear_model import LinearRegression
def group_fitter(group):
    lr = LinearRegression()
    X = group["bid"].fillna(0).values.reshape(-1,1)
    y = group["clicks"].fillna(0)
    lr.fit(X, y)
    return lr.coef_[0] # THIS IS A SCALAR

groups = t.groupby(by="market")
df = t.join(groups.apply(group_fitter).rename('new'), on='market')
print (df) 
        bid  clicks country_code market metasearch       new
0  0.462734       9            A      A          A -8.632301
1  0.438869       5            A      B          B  6.690289
2  0.047160       9            A      A          A -8.632301
3  0.644263       0            A      B          B  6.690289
4  0.579040       0            A      A          A -8.632301
5  0.820389       6            B      B          B  6.690289
6  0.112341       5            A      A          A -8.632301
7  0.432502       0            B      B          B  6.690289

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。