我的数据框:
from random import random, randint
from pandas import DataFrame
t = DataFrame({"metasearch":["A","B","A","B","A","B","A","B"],
"market":["A","B","A","B","A","B","A","B"],
"bid":[random() for i in range(8)],
"clicks": [randint(0,10) for i in range(8)],
"country_code":["A","A","A","A","A","B","A","B"]})
我想为每个设置LinearRegression market
,所以我:
1)组df- groups = t.groupby(by="market")
2)准备要适合模型的功能-
from sklearn.linear_model import LinearRegression
def group_fitter(group):
lr = LinearRegression()
X = group["bid"].fillna(0).values.reshape(-1,1)
y = group["clicks"].fillna(0)
lr.fit(X, y)
return lr.coef_[0] # THIS IS A SCALAR
3)创建一个新的Series,market
其索引和coef
值为:
s = groups.transform(group_fitter)
但是第3步失败: KeyError :(“ bid_cpc”,“在出价时发生”)
我认为您需要transform
使用,apply
因为在函数中同时使用更多的列,并且需要使用新的列join
:
from sklearn.linear_model import LinearRegression
def group_fitter(group):
lr = LinearRegression()
X = group["bid"].fillna(0).values.reshape(-1,1)
y = group["clicks"].fillna(0)
lr.fit(X, y)
return lr.coef_[0] # THIS IS A SCALAR
groups = t.groupby(by="market")
df = t.join(groups.apply(group_fitter).rename('new'), on='market')
print (df)
bid clicks country_code market metasearch new
0 0.462734 9 A A A -8.632301
1 0.438869 5 A B B 6.690289
2 0.047160 9 A A A -8.632301
3 0.644263 0 A B B 6.690289
4 0.579040 0 A A A -8.632301
5 0.820389 6 B B B 6.690289
6 0.112341 5 A A A -8.632301
7 0.432502 0 B B B 6.690289
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句