我想知道如何转换表并获得期望的结果:
我的样本数据集:
df=pd.DataFrame({
"ID":[111,111,111,111,222,222,222,333,333],
"Section":["CS01","CS01","IT01","IT01","CS02","CS02","CS02","HS01","HS01"],
"Subject":["Hist","Pol","Pol","Arts","Pol","Hist","Arts","Pol","Hist"],
"Activity":["Quiz 1","Quiz 2","Quiz 3","Quiz 1","Quiz 2","Quiz 3","Quiz 1","Quiz 2","Quiz 3"],
"Pass":[1,0,0,1,1,1,0,1,0],
})
看起来像什么:
ID Section Subject Activity Pass
0 111 CS01 Hist Quiz 1 1
1 111 CS01 Pol Quiz 2 0
2 111 IT01 Pol Quiz 3 0
3 111 IT01 Arts Quiz 1 1
4 222 CS02 Pol Quiz 2 1
5 222 CS02 Hist Quiz 3 1
6 222 CS02 Arts Quiz 1 0
7 333 HS01 Pol Quiz 2 1
8 333 HS01 Hist Quiz 3 0
我正在尝试做的是:
ID Section Subject Quiz 1 Quiz 2 Quiz 3
0 1 NA 0 1 NA 0 1 NA
111 CS01 Hist 0 1 0 0 0 1 0 0 1
111 CS01 Pol 0 0 1 1 0 0 0 0 1
111 IT01 Arts 0 1 0 0 0 1 0 0 1
111 IT01 Pol 0 0 1 0 0 1 1 0 0
222 CS02 Arts 1 0 0 0 0 0 0 0 0
222 CS02 Hist 0 0 1 0 0 1 0 1 0
222 CS02 Pol 0 0 1 0 1 0 0 0 1
333 HS01 Hist 0 0 1 0 0 1 1 0 0
333 HS01 Pol 0 0 1 0 1 0 0 0 1
我想要的是将“主题”列设置为2级,将其“通过”列设置为1级,并使用“ NA”列。
到目前为止,我仅有的是:
df.groupby(["ID","Section", "Subject","Activity"])["Pass"].value_counts().unstack().fillna(0)
但这没有级别2的“ NA”列和“活动”
想法是首先Series.reindex
使用MultiIndex.from_product
和创建所有可能的组合,然后在value_countsMultiIndex
和dropna=False
中应用您的解决方案:
s = df.set_index(["ID","Section", "Subject","Activity"])["Pass"]
df = (s.reindex(pd.MultiIndex.from_product(s.index.levels))
.groupby(level=[0,1,2,3])
.value_counts(dropna=False)
.unstack([3,4], fill_value=0)
.sort_index(axis=1))
print (df)
Activity Quiz 1 Quiz 2 Quiz 3
Pass 0.0 1.0 NaN 0.0 1.0 NaN 0.0 1.0 NaN
ID Section Subject
111 CS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 1 0 0 0 1 0 0 1
Pol 0 0 1 1 0 0 0 0 1
CS02 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
HS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
IT01 Arts 0 1 0 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 1 0 0
222 CS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
CS02 Arts 1 0 0 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 1 0
Pol 0 0 1 0 1 0 0 0 1
HS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
IT01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
333 CS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
CS02 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
HS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 1 0 0
Pol 0 0 1 0 1 0 0 0 1
IT01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
编辑:使用重复的解决方案:
df=pd.DataFrame({
"ID":[111,111,111,111,222,222,222,333,333],
"Section":["CS01","CS01","IT01","IT01","CS02","CS02","CS02","HS01","HS01"],
"Subject":["Hist","Pol","Pol","Arts","Pol","Hist","Arts","Pol","Hist"],
"Activity":["Quiz 1","Quiz 2","Quiz 3","Quiz 1","Quiz 2","Quiz 3","Quiz 1","Quiz 2","Quiz 3"],
"Pass":[1,0,0,1,1,1,0,1,0],
})
df = pd.concat([df, df.head()])
print (df)
ID Section Subject Activity Pass
0 111 CS01 Hist Quiz 1 1
1 111 CS01 Pol Quiz 2 0
2 111 IT01 Pol Quiz 3 0
3 111 IT01 Arts Quiz 1 1
4 222 CS02 Pol Quiz 2 1
5 222 CS02 Hist Quiz 3 1
6 222 CS02 Arts Quiz 1 0
7 333 HS01 Pol Quiz 2 1
8 333 HS01 Hist Quiz 3 0
0 111 CS01 Hist Quiz 1 1 <- duplicates
1 111 CS01 Pol Quiz 2 0 <- duplicates
2 111 IT01 Pol Quiz 3 0 <- duplicates
3 111 IT01 Arts Quiz 1 1 <- duplicates
4 222 CS02 Pol Quiz 2 1 <- duplicates
首先使用SeriesGroupBy.value_counts
由和重塑ONY去年的水平Series.unstack
,加上所有可能的组合的levels
通过DataFrame.reindex
,并添加列NaN
由实心1
如果所有值都0
在通过测试的两列DataFrame.eq
和DataFrame.all
,最后unstack
为MultiIndex
列,层次和形式的变化顺序MultiIndex
:
df1 = (df.groupby(["ID","Section", "Subject","Activity"])["Pass"]
.value_counts()
.unstack(fill_value=0))
df1 = df1.reindex(pd.MultiIndex.from_product(df1.index.levels), fill_value=0)
df1[np.nan] = df1.eq(0).all(axis=1).view('i1')
df1 = df1.unstack().swaplevel(1,0, axis=1).sort_index(axis=1)
print (df1)
Activity Quiz 1 Quiz 2 Quiz 3
Pass 0.0 1.0 NaN 0.0 1.0 NaN 0.0 1.0 NaN
ID Section Subject
111 CS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 2 0 0 0 1 0 0 1
Pol 0 0 1 2 0 0 0 0 1
CS02 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
HS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
IT01 Arts 0 2 0 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 2 0 0
222 CS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
CS02 Arts 1 0 0 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 1 0
Pol 0 0 1 0 2 0 0 0 1
HS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
IT01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
333 CS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
CS02 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
HS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 1 0 0
Pol 0 0 1 0 1 0 0 0 1
IT01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句