目前,我的代码如下所示:
import pandas as pd
Version = {'2','4','6','8','10','12', 'more'}
data = {'Version':['some unwanted text 2 3 4 5', ' some more text 6 7 8 9 10', '12 more text 11 ']}
df = pd.DataFrame(data)
def Version_finder(x):
df_words = set(x.split(' '))
extract_words = Version.intersection(df_words)
return ' '.join(extract_words)
df['New_Version'] = df.Version.apply(Version_finder)
输出为:
Version New_Version
0 some unwanted text 2 3 4 5 4 2
1 some more text 6 7 8 9 10 6 10 more 8
2 12 more text 11 12 more
但是,所需的输出是:
Version New_Version
0 some unwanted text 2 3 4 5 2
1 some more text 6 7 8 9 10 more
2 12 more text 11 12
**我只需要在“ New_Version”列中返回1个值即可。这必须是Set **中指定的version列中出现的第一个值
想法不是将拆分的值转换为set,因为in set中的定义顺序未定义,请按列表理解进行过滤,如果存在其他情况,则最后使用next
withiter
返回第一个匹配的值None
:
f = lambda x: next(iter([y for y in x.split() if y in Version]), None)
df['New_Version'] = df.Version.apply(f)
print (df)
Version New_Version
0 some unwanted text 2 3 4 5 2
1 some more text 6 7 8 9 10 more
2 12 more text 11 12
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句