Python Pandas对组间进行排序，而不对组内进行排序（重新排列分组行，但在groupby之前保持原始行顺序

Chris 发表于 Dev

克里斯

我想根据一列对行进行排序（在我的示例中，“组”是要分组的列，然后对组进行排序（保持组内行顺序）。我无法按索引排序，因为索引是有目的的由于先前的操作而出现故障。

df = pd.DataFrame({
    'Group':[5,5,5,9,9,777,777,1,2,2],  
    'V1':['a','b','a',3,6,1,None,10,3,None], 
    'V2':['blah','blah','blah','dog','cat','cat','na','first','last','nada'],
    'V3':[1,2,3,4,5,5,4,3,2,1,]
})

并希望它看起来像这样：

我尝试过各种事情，例如

df.groupby(['Group'])['Group']).aggregate({'min grp':'min'}).sort_values(by=['min grp'], ascending=True)

如果有帮助，则原始文档df是通过创建的pd.concat(list-of-dataframes)，当我随后按Group对它们进行排序时，它还会根据索引对Group中的行进行排序，这不适用于我的特定问题。

安迪（Andy L.）

您需要使用sort_valueswith选项kind='mergesort'。从熊猫文档：

kind : {‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’
      Choice of sorting algorithm. See also ndarray.np.sort for more
      information. mergesort is the only stable algorithm. For DataFrames,
      this option is only applied when sorting on a single column or label.

排序算法称为stablewhen two identical element with equal keys appear in the same order as they are in the input。稳定排序的列表是：insertion sort, merge sort, bubble sort, tim sort, counting sort

因此，您需要：

df = df.sort_values('Group', kind='mergesort')

当您sort_values不拨打电话时kind，它是默认的“快速排序”，并且quicksort不稳定

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-01-21

我来说两句

0 条评论

登录后参与评论

上一篇：反应：使用传播运算符使用表单数据onChange更新状态

TOP 榜单

文章