我的代码 -
df=pd.read_csv("file")
l1=[]
l2=[]
for i in range(0,len(df['unions']),len(df['district'])):
l1.append(' '.join((df['unions'][i], df['district'][i])))
l2.append(({"entities": [[(ele.start(), ele.end() - 1) for ele in re.finditer(r'\S+', df['unions'][i])] ,df['subdistrict'][i]],}))
TRAIN_DATA=list(zip(l1,l2))
print(TRAIN_DATA)
结果 - [('Dhansagar Bagerhat', {'entities': [[(0, 8)], 'Sarankhola']})]
我的预期输出 -[('Dhansagar Bagerhat', {'entities': [[(0, 8)], 'Sarankhola'],[[(10, 17)], 'AnyLabel']})]
如何获得所有行的输出?我只得到一行的结果。好像我的循环不起作用。任何人都可以指出我的错误吗?
我的 csv 文件看起来像这样。“AnyLabel”是另一列。我有大约 500 行 -
unions subdistrict district
Dhansagar Sarankhola Bagerhat
Daibagnyahati Morrelganj Bagerhat
Ramchandrapur Morrelganj Bagerhat
Kodalia Mollahat Bagerhat
尝试使用str.join
:
df=pd.read_csv("file")
l1=[]
l2=[]
for idx, row in df.iterrows():
l1.append(' '.join((row['unions'], row['district'])))
l2.append(({"entities": [[[ele.start(), ele.end() - 1], ele.group(0)] for ele in re.finditer(r'\S+', ' '.join([row['unions'] ,row['subdistrict']]))]}))
TRAIN_DATA=list(zip(l1,l2))
print(TRAIN_DATA)
输出:
[('Dhansagar Bagerhat', {'entities': [[[0, 8], 'Dhansagar'], [[10, 19], 'Sarankhola']]}), ('Daibagnyahati Bagerhat', {'entities': [[[0, 12], 'Daibagnyahati'], [[14, 23], 'Morrelganj']]}), ('Ramchandrapur Bagerhat', {'entities': [[[0, 12], 'Ramchandrapur'], [[14, 23], 'Morrelganj']]}), ('Kodalia Bagerhat', {'entities': [[[0, 6], 'Kodalia'], [[8, 15], 'Mollahat']]})]
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句