我有一些看起来像这样的长格式数据(请参见下面的重新创建):
>>> df
section subsection name topic score
0 A W zwphf a 0.802427
1 A W jcyyc a 0.404077
2 A W kucem a 0.367319
3 A X ldbxz a 0.554260
4 A X vkcqh a 0.265864
5 A X cvksn a 0.548099
6 B Y spghx a 0.472612
7 B Y cqokn a 0.577504
8 B Y wjsxg a 0.815309
9 B Z holoo a 0.459850
10 B Z lnihf a 0.667877
11 B Z wirhq a 0.138879
12 A W zwphf b 0.673711
13 A W jcyyc b 0.507962
14 A W kucem b 0.546055
15 A X ldbxz b 0.148214
16 A X vkcqh b 0.773320
17 A X cvksn b 0.791990
18 B Y spghx b 0.487480
19 B Y cqokn b 0.252534
20 B Y wjsxg b 0.237767
21 B Z holoo b 0.432981
22 B Z lnihf b 0.317932
23 B Z wirhq b 0.614401
我想对section
+ subsection
+ name
+topic
加上unstack进行分组topic
,但还要显示间歇嵌套的“全部”小计行:
>>> result
section subsection name a b
0 A All All 0.490341 0.573542
1 A W All 0.524608 0.575909
2 A W jcyyc 0.404077 0.507962
3 A W kucem 0.367319 0.546055
4 A W zwphf 0.802427 0.673711
5 A X All 0.456074 0.571174
6 A X cvksn 0.548099 0.791990
7 A X ldbxz 0.554260 0.148214
8 A X vkcqh 0.265864 0.773320
9 B All All 0.522005 0.390516
10 B Y All 0.621808 0.325927
11 B Y cqokn 0.577504 0.252534
12 B Y spghx 0.472612 0.487480
13 B Y wjsxg 0.815309 0.237767
14 B Z All 0.422202 0.455104
15 B Z holoo 0.459850 0.432981
16 B Z lnihf 0.667877 0.317932
17 B Z wirhq 0.138879 0.614401
突出显示新行可能更容易将其可视化:
最初的分组依据本身(不含小计)如下所示:
>>> df.groupby(['section', 'subsection', 'name', 'topic'])['score'].mean().unstack('topic')
topic a b
section subsection name
A W jcyyc 0.404077 0.507962
kucem 0.367319 0.546055
zwphf 0.802427 0.673711
X cvksn 0.548099 0.791990
ldbxz 0.554260 0.148214
vkcqh 0.265864 0.773320
B Y cqokn 0.577504 0.252534
spghx 0.472612 0.487480
wjsxg 0.815309 0.237767
Z holoo 0.459850 0.432981
lnihf 0.667877 0.317932
wirhq 0.138879 0.614401
但我不知道到底如何使用margins
,以便得到有关GROUPBY OPS小计['section', 'topic']
和['section', 'subsection', 'topic']
。
重新创建df
:
import pandas as pd
data = [['A', 'W', 'zwphf', 'a', 0.80242702],
['A', 'W', 'jcyyc', 'a', 0.40407741],
['A', 'W', 'kucem', 'a', 0.36731944],
['A', 'X', 'ldbxz', 'a', 0.55426007],
['A', 'X', 'vkcqh', 'a', 0.26586396],
['A', 'X', 'cvksn', 'a', 0.54809939],
['B', 'Y', 'spghx', 'a', 0.47261223],
['B', 'Y', 'cqokn', 'a', 0.57750357],
['B', 'Y', 'wjsxg', 'a', 0.81530899],
['B', 'Z', 'holoo', 'a', 0.45985020],
['B', 'Z', 'lnihf', 'a', 0.66787651],
['B', 'Z', 'wirhq', 'a', 0.13887864],
['A', 'W', 'zwphf', 'b', 0.67371101],
['A', 'W', 'jcyyc', 'b', 0.50796174],
['A', 'W', 'kucem', 'b', 0.54605544],
['A', 'X', 'ldbxz', 'b', 0.14821402],
['A', 'X', 'vkcqh', 'b', 0.77331968],
['A', 'X', 'cvksn', 'b', 0.79198960],
['B', 'Y', 'spghx', 'b', 0.48747995],
['B', 'Y', 'cqokn', 'b', 0.25253355],
['B', 'Y', 'wjsxg', 'b', 0.23776694],
['B', 'Z', 'holoo', 'b', 0.43298050],
['B', 'Z', 'lnihf', 'b', 0.31793156],
['B', 'Z', 'wirhq', 'b', 0.61440056]]
df = pd.DataFrame(data,
columns=['section', 'subsection', 'name', 'topic', 'score'])
要重新创建预期结果:
import numpy as np
result = np.array([['A', 'All', 'All', 0.490341219, 0.573541919],
['A', 'W', 'All', 0.52460796, 0.5759094],
['A', 'W', 'jcyyc', 0.404077415, 0.5079617479999999],
['A', 'W', 'kucem', 0.36731944, 0.546055442],
['A', 'W', 'zwphf', 0.8024270240000001, 0.673711011],
['A', 'X', 'All', 0.45607447700000003, 0.571174437],
['A', 'X', 'cvksn', 0.548099391, 0.791989603],
['A', 'X', 'ldbxz', 0.554260074, 0.148214029],
['A', 'X', 'vkcqh', 0.265863967, 0.77331968],
['B', 'All', 'All', 0.5220050279999999, 0.390515513],
['B', 'Y', 'All', 0.621808268, 0.325926816],
['B', 'Y', 'cqokn', 0.577503576, 0.252533557],
['B', 'Y', 'spghx', 0.472612233, 0.487479951],
['B', 'Y', 'wjsxg', 0.815308995, 0.237766941],
['B', 'Z', 'All', 0.42220178799999997, 0.455104209],
['B', 'Z', 'holoo', 0.459850205, 0.43298050200000004],
['B', 'Z', 'lnihf', 0.667876511, 0.317931565],
['B', 'Z', 'wirhq', 0.13887864800000002, 0.61440056]], dtype=object)
result = pd.DataFrame(result, columns=['section', 'subsection', 'name', 'a', 'b'])
你需要:
s = df.groupby(['section', 'subsection', 'name', 'topic'])['score'].mean().unstack('topic')
s1 = (s.mean(level=0)
.assign(subsection = 'All', name='All')
.set_index(['subsection','name'], append=True))
s2 = (s.mean(level=[0, 1])
.assign(name='All')
.set_index(['name'], append=True))
s = pd.concat([s, s1, s2]).sort_index()
但是如果不必submeans
确定上述解决方案是否正确(均值),则更好的是:
s1 = df.groupby(['section','topic'])['score'].mean().unstack('topic').assign(subsection = 'All', name='All').set_index(['subsection','name'], append=True)
s2 = df.groupby(['section','subsection','topic'])['score'].mean().unstack('topic').assign(name='All').set_index(['name'], append=True)
s = pd.concat([s, s1, s2]).sort_index()
print (s)
topic a b
section subsection name
A All All 0.490341 0.573542
W All 0.524608 0.575909
jcyyc 0.404077 0.507962
kucem 0.367319 0.546055
zwphf 0.802427 0.673711
X All 0.456074 0.571174
cvksn 0.548099 0.791990
ldbxz 0.554260 0.148214
vkcqh 0.265864 0.773320
B All All 0.522005 0.390516
Y All 0.621808 0.325927
cqokn 0.577504 0.252534
spghx 0.472612 0.487480
wjsxg 0.815309 0.237767
Z All 0.422202 0.455104
holoo 0.459850 0.432980
lnihf 0.667877 0.317932
wirhq 0.138879 0.614401
编辑:
如果需要订购-可以使用tot
insta :All
ordered categoricals
cat1 = ['tot'] + df['subsection'].unique().tolist()
cat2 = ['tot'] + df['name'].unique().tolist()
df['subsection'] = pd.Categorical(df['subsection'], categories=cat1, ordered=True)
df['name'] = pd.Categorical(df['name'], categories=cat2, ordered=True)
s = df.groupby(['section', 'subsection', 'name', 'topic'])['score'].mean().unstack('topic')
s1 = (df.groupby(['section','topic'])['score'].mean()
.unstack('topic').assign(subsection = 'tot', name='tot')
.set_index(['subsection','name'], append=True))
s2 = (df.groupby(['section','subsection','topic'])['score'].mean()
.unstack('topic')
.assign(name='tot')
.set_index(['name'], append=True))
s = pd.concat([s, s1, s2]).sort_index()
print (s)
topic a b
section subsection name
A tot tot 0.490341 0.573542
W tot 0.524608 0.575909
zwphf 0.802427 0.673711
jcyyc 0.404077 0.507962
kucem 0.367319 0.546055
X tot 0.456074 0.571174
ldbxz 0.554260 0.148214
vkcqh 0.265864 0.773320
cvksn 0.548099 0.791990
B tot tot 0.522005 0.390516
Y tot 0.621808 0.325927
spghx 0.472612 0.487480
cqokn 0.577504 0.252534
wjsxg 0.815309 0.237767
Z tot 0.422202 0.455104
holoo 0.459850 0.432980
lnihf 0.667877 0.317932
wirhq 0.138879 0.614401
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句