熊猫：将一列中的单词数按另一列的值排序

108

马特·霍夫

我有两列：df[upvotes]和df[headline]。标题列包含带有标题字符串的行，而upvotes列只是具有整数的行。

我想使用熊猫来找出标题中最多的单词投票最多。

做这个的最好方式是什么？

到目前为止，我已经知道了这一点，但是apply方法将一系列传递给x，所以很显然我不明白这是如何工作的。

df.groupby('upvotes')['headline'].apply(lambda x: len(x.split(' '))).sort_index(ascending=False)

前5行数据：

   upvotes                                           headline                  
0        1  Software: Sadly we did adopt from the construc...                  
1        1   Google’s Stock Split Means More Control for L...                  
2        1  SSL DOS attack tool released exploiting negoti...                  
3       67       Immutability and Blocks Lambdas and Closures                  
4        1         Comment optimiser la vitesse de Wordpress?

如果我了解您的问题，则可以使用groupby.mean此方法。您可以groupby.sum根据需要替换为。

一般来说，最好避免使用lambda函数。

df = pd.DataFrame({'upvotes': [1, 1, 1, 67, 1],
                   'headline': ['Software: Sadly we did adopt from the', 'Google’s Stock Split Means More Control for',
                                'SSL DOS attack tool released exploiting', 'Immutability and Blocks Lambdas and Closures',
                                'Comment optimiser la vitesse de Wordpress? ']})

df['wordcount'] = df['headline'].str.split().map(len)

df = df.groupby('wordcount', as_index=False)['upvotes'].mean()\
       .sort_values('upvotes', ascending=False)

print(df)

#    wordcount  upvotes
# 0          6       23
# 1          7        1

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。