假设我有一个如下的数据框。数据框描述了涂料的组成,因此可以通过将特定颜色(子类型列)以给定的百分比(重量比)混合来描述任意命名的涂料(NAME列)的组成,并且还可以通过广义化来区分特定颜色父项(类型)。日期已包含为完整性检查,但此处未使用。
|weight|NAME |type |subtype |date |
--------------------------------------------------
|93.35 |candyapple |red |maroon |2018-06-30|
|6.65 |candyapple |red |crimson |2018-06-30|
|93.41 |grannysmith|green |limegreen |2010-03-31|
|1.78 |grannysmith|green |deepgreen |2019-12-31|
|0.72 |grannysmith|yellow|goldyellow |2019-12-31|
|2.96 |grannysmith|brown |lightbrown |2014-10-31|
|33.33 |awfulbrown |red |maroon |2020-10-31|
|33.33 |awfulbrown |yellow|plainyellow|2010-06-30|
|33.33 |awfulbrown |green |deepgreen |2020-02-29|
--------------------------------------------------
因此candyapple
,完整的妆容是93.35% crimson
和6.65% maroon
,它们都是红色的子类型。grannysmith
可以用上面的子类型表示,但是我们也可以将它称为类型95.19% green
,也就是它的绿色子类型和0.72% yellow
和的总和2.96% brown
。在绘画配置中,用于子类型和类型的名称是通用的,但并非所有配置都会列出所有子类型。如果未列出子类型,则假定为0.00%。因此,例如,我们看到其中candyapple
未列出任何内容green
-我们可以假设它为0.00% limegreen
。
|NAME |maroon|crimson|limegreen|deepgreen|goldyellow|lightbrown|maroon|plainyellow|deepgreen|
---------------------------------------------------------------------------------------------------
|candyapple |93.35 |6.65 |0.00 |0.00 |0.00 |0.00 |0.00 |0.00 |0.00 |
|grannysmith|0.00 |0.00 |93.41 |1.78 |0.72 |2.96 |0.00 |0.00 |0.00 |
|awfulbrown |33.33 |0.00 |0.00 |33.33 |0.00 |0.00 |0.00 |33.33 |0.00 |
---------------------------------------------------------------------------------------------------
1a。使用大熊猫,我该如何转置,以使的值subtype
成为列标题,并且所有值都按排序到单个行中NAME
?
1b。换位后,我应该如何在表格中填充任何空白0.00
?(例如,candyapple
是0.00% limegreen
)
type
不是subtype
?类型的权重是其子类型的权重之和。|NAME |red |green |yellow |brown |
----------------------------------------------
|candyapple |100.00|0.00 |0.00 |0.00 |
|grannysmith|0.00 |95.19 |0.72 |2.96 |
|awfulbrown |33.33 |33.33 |33.33 |0.00 |
----------------------------------------------
2a。已经按照(1)进行了调换,但是使用type
这段时间,如何使用pandas / python对值进行求和,以使给定type
的权重为其权重的总和subtype
?
|NAME |red |green |yellow |brown |maroon|crimson|limegreen|deepgreen|goldyellow|lightbrown|maroon|plainyellow|deepgreen|
---------------------------------------------------------------------------------------------------
|candyapple |100.00|0.00 |0.00 |0.00 |93.35 |6.65 |0.00 |0.00 |0.00 |0.00 |0.00 |0.00 |0.00 |
|grannysmith|0.00 |95.19 |0.72 |2.96 |0.00 |0.00 |93.41 |1.78 |0.72 |2.96 |0.00 |0.00 |0.00 |
|awfulbrown |33.33 |33.33 |33.33 |0.00 |33.33 |0.00 |0.00 |33.33 |0.00 |0.00 |0.00 |33.33 |0.00 |
---------------------------------------------------------------------------------------------------
3a。大熊猫是否具有从原始数据集中创建的总和type
和的单个权重的上述组合DF的方法subtypes
?
对于第一种情况,apivot
就足够了,因为不需要聚合:
df.pivot('NAME', 'subtype', 'weight').fillna(0)
subtype crimson deepgreen goldyellow lightbrown limegreen maroon \
NAME
awfulbrown 0.00 33.33 0.00 0.00 0.00 33.33
candyapple 6.65 0.00 0.00 0.00 0.00 93.35
grannysmith 0.00 1.78 0.72 2.96 93.41 0.00
subtype plainyellow
NAME
awfulbrown 33.33
candyapple 0.00
grannysmith 0.00
对于第二种情况,您可以使用pivot_table
,并与进行聚合sum
:
df.pivot_table(index='NAME', columns='type', values='weight', aggfunc='sum', fill_value=0)
type brown green red yellow
NAME
awfulbrown 0.00 33.33 33.33 33.33
candyapple 0.00 0.00 100.00 0.00
grannysmith 2.96 95.19 0.00 0.72
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句