对 Pandas 在 groupby 中的行为感到困惑

用户8270077

我有一个大型数据集，其中有一个二进制变量：

Transactions['has_acc_id_and_cus_id'].value_counts()
1    1295130
0     823869
Name: has_acc_id_and_cus_id, dtype: int64

当我分组这个数据集 --Transactions-- 使用这个特定的二元变量作为一个分组变量时，我得到一个分组的数据集 --df100-- 只有上述二元变量的一个级别。

df100 = Transactions.groupby(['acc_reg_year', 'acc_reg_month', 'year', 'month',\
                              'has_acc_id_and_cus_id'])[['net_revenue']].agg(['sum', 'mean', 'count'])

df100['has_acc_id_and_cus_id'].value_counts()
1    1421
Name: has_acc_id_and_cus_id, dtype: int64

松弛线

如果您真的想继续groupby，has_acc_id_and_cus_id那么您想要的命令将是......

df100 = Transactions[['has_acc_id_and_cus_id', 'net_revenue']].groupby(['has_acc_id_and_cus_id']).agg(['sum', 'mean', 'count'])

这个子集只是你想用 ( has_acc_id_and_cus_id)总结的变量和你想总结的变量 ( net_revenue)...

Transactions[['has_acc_id_and_cus_id', 'net_revenue']]

...然后您将这些按has_acc_id_and_cus_id...分组

Transactions[['has_acc_id_and_cus_id', 'net_revenue']].groupby('has_acc_id_and_cus_id')

...然后再应用该agg()函数以获取所需的统计信息。

你犯了错，根据您在总结的明确目标has_acc_id_and_cus_id单纯，是有你被分组其他四个变量（acc_reg_year，acc_reg_month，year和month）。

如果你确实想要通过总结做has_acc_id_and_cus_id 内所有的人，然后你原来的代码是正确的，但也许有一个缺失值以上的acc_reg_year，acc_reg_month，year和month时has_acc_id_and_cus_id == 0，因此请检查您的数据...

Transactions[Transactions[`has_acc_id_and_cus_id`] == 0][[`acc_reg_year`, `acc_reg_month`, `year`, `month`]].head(100)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-07-14

我来说两句

0 条评论

登录后参与评论

上一篇：從某個位置開始檢查字符串是否與給定的字符串匹配 javascript

Pandas GroupBy 聚合行为

使Pandas groupby的行为类似于itertools groupby

Pandas：Groupby 中的语法

当NaN在组列中时，Pandas groupby应用奇怪的行为

列表中的Pandas groupby值

结果中的“ Pandas Groupby”列

Pandas 中 groupby 列的 timedeltas

Groupby Pandas中的条件计数

了解Python Pandas中的groupby（）

groupby 并在 pandas 中申请

比较 Pandas 中 groupby 对象中的列

对？的行为感到困惑。算子

对 Flask 的行为感到困惑

Groupby大于Pandas中的速度非常慢

在pandas.groupby中读取值

Groupby计算以及Pandas中的支点功能

groupby 首先作为 Pandas 中的字典

Python Pandas中的常规Groupby：快速方法

在Pandas groupby对象中获取比率

groupby中的pandas聚合函数-默认选项？

汇总Pandas Groupby中的一列

在Pandas中获取groupby操作的大小

重命名Pandas Groupby函数中的列名

Pandas groupby 统计聚合函数中的值

对pandas groupby中的多行进行操作

在pandas groupby输出中包含特定列

Pandas.groupby.apply（）中的内存泄漏？

您如何从Bokeh的Pandas GroupBy中谋划？

在Pandas Groupby中设置组值

TOP 榜单

文章

对 Pandas 在 groupby 中的行为感到困惑

对 Pandas 在 groupby 中的行为感到困惑

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Java Eclipse中的错误13，如何解决？

在Windows 7中无法删除文件（2）

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

套接字无法检测到断开连接

带有错误“ where”条件的查询如何返回结果？

有什么解决方案可以将android设备用作Cast Receiver？

Mac OS X更新后的GRUB 2问题

ggplot：对齐多个分面图-所有大小不同的分面

验证REST API参数

如何从视图一次更新多行（ASP.NET - Core）

尝试反复更改屏幕上按钮的位置 - kotlin android studio

计算数据帧中每行的NA

检索角度选择div的当前值

离子动态工具栏背景色

UITableView的项目向下滚动后更改颜色，然后快速备份

VB.net将2条特定行导出到DataGridView

蓝屏死机没有修复解决方案

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException