熊猫使用groupby和基于值的过滤器

用户名

我有一个熊猫数据框df,看起来像这样:

| source_num| source_date| text      | category    |location    | source |
+---------+------------+-------------+-------------+------------+--------+---
|  0      | 15/12/2020 | text1       | cat 1       | loc1       |soucrce1|
|  1      | 15/12/2020 | text2       | cat 2       | loc2       |source 2|
|  2      | 15/12/2020 | text3       | cat 3       | loc2       |source 3|
|  3      | 15/12/2020 | text4       | cat 2       | loc3       |source 2|
| ...     | ...        | ...         |             |            |        |

现在,我可以按“项目”列对该数据帧进行分组,并对值进行一些汇总:

grouped = df.groupby(['category','source_num',"source","location"]).aggregate('sum')

上面的语句返回正确的结果。

但是,当我尝试对groupby对象执行一些过滤时,它返回以下错误:

grouped.filter(lambda x: x['location']== 'loc2')


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-86-d8176314331e> in <module>
      3 
      4 grouped = wasi2adf.groupby(['category    ','source_num',"source","location"]).aggregate('sum')
----> 5 grouped.filter(lambda x: x['location']== 'loc2')
      
f:\aienv\lib\site-packages\pandas\core\generic.py in filter(self, items, like, regex, axis)
   4618         if items is not None:
   4619             name = self._get_axis_name(axis)
-> 4620             return self.reindex(**{name: [r for r in items if r in labels]})
   4621         elif like:
   4622 

TypeError: 'function' object is not iterable 

过滤后的预期结果:

| source_num| source_date| text      | category    |location    | source |
+---------+------------+-------------+-------------+------------+--------+---
|  0      | 15/12/2020 | text2       | cat 2       | loc2       |soucrce2|
|  1      | 15/12/2020 | text3       | cat 3       | loc2       |source 3|
sammywemmy

尝试这个 :

grouped = df.groupby(['category','source_num',"source","location"], as_index = False).aggregate('sum')

然后在中筛选特定值location

 grouped.loc[grouped["location"] == "loc2"]


category    source_num  source  location    source_date text
1   cat 2   1   source 2    loc2    15/12/2020  text2
3   cat 3   2   source 3    loc2    15/12/2020  text3

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章