import numpy as np
df = df.dropna(subset=['genres']).reset_index(drop=True)
splitted = df['genres'].str.split('|')
l = splitted.str.len()
x = df['gross'] / df['budget']
df = pd.DataFrame({x: np.repeat(df[x], l), 'genres':np.concatenate(splitted)})
d = {'mean':'Average Income'}
df1 = df.groupby('genres')[x].agg(['mean']).rename(columns=d)
df1.plot.bar()
plt.yscale("log")
plt.xlabel("Genre")
我想绘制多少个流派的每个“ x”的平均值(因为单个电影有多个流派,所以我将它们拆分为单个流派),但是我不确定我的代码有什么问题。这不是我想要的。我需要一些帮助。
我认为如果需要聚合仅使用一个更常见的功能groupby
+ mean
:
import numpy as np
df = pd.DataFrame({'genres':['Comedy|Crime|Drama|Thriller','Comedy|Crime|Drama',
'Comedy|Crime','Drama|Thriller','Drama','Comedy|Crime'],
'gross':[10,20,30,40,50,60],
'budget':[3,4,5,3,2,5]})
df = df.dropna(subset=['genres']).reset_index(drop=True)
splitted = df['genres'].str.split('|')
l = splitted.str.len()
x = df['gross'] / df['budget']
#is necessary define new column name (divided) and change `df[]` to `x`
df = pd.DataFrame({'divided': np.repeat(x, l), 'genres':np.concatenate(splitted)})
print (df)
divided genres
0 3.333333 Comedy
1 3.333333 Crime
2 3.333333 Drama
3 3.333333 Thriller
4 5.000000 Comedy
5 5.000000 Crime
6 5.000000 Drama
7 6.000000 Comedy
8 6.000000 Crime
9 13.333333 Drama
10 13.333333 Thriller
11 25.000000 Drama
12 12.000000 Comedy
13 12.000000 Crime
#define column for aggregate (divided), no x, because processing new df created by repeat
d = {'mean':'Average Income'}
df1 = df.groupby('genres')['divided'].mean().rename(columns=d).reset_index(name='return')
df1.plot.bar(x='genres', y='return')
plt.yscale("log")
plt.xlabel("Genre")
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句