我有DataFrame
以下的熊猫。
import pandas as pd
import numpy as np
df = pd.DataFrame([['Bay of Plenty', 'Bell Rd, Nukuhou','Nukuhou, Bay of Plenty'],[1.0, 0.5,1.0]]).T
df.columns = ['col1','col2']
col1 col2
0 Bay of Plenty 1
1 Bell Rd, Nukuhou 0.5
2 Nukuhou, Bay of Plenty 1
我想得到以下输出。
col1 sum
Bay of Plenty 2.0
Nukuhou 1.5
Bell Rd 0.5
我尝试了以下方法。
df["splited"]=df["col1"].str.split(",")
df = (df.explode("splited").reset_index(drop=True))
col1 col2 splited
0 Bay of Plenty 1 Bay of Plenty
1 Bell Rd, Nukuhou 0.5 Bell Rd
2 Bell Rd, Nukuhou 0.5 Nukuhou
3 Nukuhou, Bay of Plenty 1 Nukuhou
4 Nukuhou, Bay of Plenty 1 Bay of Plenty
df.groupby(['splited']).sum().reset_index()
但这不给总和吗?
您可以split
按,
空格- ,
:
#whitespaces
print(df["col1"].str.split(",").tolist())
[['Bay of Plenty'], ['Bell Rd', ' Nukuhou'], ['Nukuhou', ' Bay of Plenty']]
^^^ ^^^
#no whitespaces
print(df["col1"].str.split(", ").tolist())
[['Bay of Plenty'], ['Bell Rd', 'Nukuhou'], ['Nukuhou', 'Bay of Plenty']]
df["splited"]=df["col1"].str.split(", ")
df = df.explode("splited")
df = df.groupby('splited')['col2'].sum().reset_index()
print(df)
splited col2
0 Bay of Plenty 2.0
1 Bell Rd 0.5
2 Nukuhou 1.5
另一个想法是Series.str.strip
用于删除尾随空格:
df["splited"]=df["col1"].str.split(",")
df = df.explode("splited")
df = df.groupby(df['splited'].str.strip())['col2'].sum().reset_index()
print(df)
splited col2
0 Bay of Plenty 2.0
1 Bell Rd 0.5
2 Nukuhou 1.5
编辑:
如果需要在,
可能的正则表达式后除以一或没有空格,则:
df = pd.DataFrame([['Bay of Plenty', 'Bell Rd, Nukuhou',
'Nukuhou,Bay of Plenty'],[1.0, 0.5,1.0]]).T
df.columns = ['col1','col2']
df["splited"]=df["col1"].str.split(",\s*")
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句