假设我有以下字符串:
"USD Notional Amount: USD 50,000,000.00"
"USD Fixed Rate Payer Currency Amount: USD 10,000,000"
"USD Fixed Rate Payer Payment Dates: Annually"
"KRW Fixed Rate Payer Payment Dates: Annually"
简单来说,使用split函数
df = pd.DataFrame(["USD Notional Amount: USD 50,000,000.00"
,"USD Fixed Rate Payer Currency Amount: USD 10,000,000"
,"USD Fixed Rate Payer Payment Dates: Annually"
,"KRW Fixed Rate Payer Payment Dates: Annually"])
df[0].apply(lambda x: x.split())
[输出]
0 [USD, Notional, Amount:, USD, 50,000,000.00]
1 [USD, Fixed, Rate, Payer, Currency, Amount:, USD, 10,000,000]
2 [USD, Fixed, Rate, Payer, Payment, Dates:, Annually]
3 [KRW, Fixed, Rate, Payer, Payment, Dates:, Annually]
我想要保留复合词列表
words_list = ["Notional Amount:","Fixed Rate Payer Currency Amount:","Fixed Rate Payer Payment Dates:"]
我想要的是将字符串拆分为字符串数组,如下所示:
["USD","Notional Amount:","USD", "50,000,000.00"]
["USD","Fixed Rate Payer Currency Amount:","USD","10,000,000"]
["USD","Fixed Rate Payer Payment Dates:","Annually"]
["KRW","Fixed Rate Payer Payment Dates:","Annually"]
当我拆分这个字符串时,我想保留一些单词,因为它并不总是按空格拆分。任何人都知道如何在 Python 中进行这种字符串拆分?有什么想法吗?
正如 Xhattam 所说,可能没有通用的方法来做你的事情。
但是,假设您知道不想拆分哪些带有空格的字符串,则可以执行以下操作(从您的示例中):
test = "USD Notional Amount: USD 50,000,000.00"
a = ['Notional Amount:', 'Fixed Rate Payer Currency Amount:', 'Fixed Rate Payer Payment Dates:', 'Fixed Rate Payer Payment Dates:']
for element in a:
if element in test:
# Do this to strip your string from the list
my_list = test.replace(element, '')
# Do this to replace double space by simple space following the word stripping
my_list = test.replace(' ', ' ')
# Insert the element you striped in the list at the wanted index
my_list.insert(1, element)
break
现在您应该能够打印 my_list 并获得以下结果:
print(my_list)
['USD', 'Notional Amount:', 'USD', '50,000,000.00']
这是一个特定示例,您可以轻松适应其他字符串。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句