拆分字符串列并在Python中提取第二部分

ahbon 发表于 Dev

阿邦

假设我有一个数据框，如下所示：

df = pd.DataFrame({"id": range(4), "price": ["15dollar/m2/day", "90dollar/m2/month", "18dollar/m2/day", "100dollar/m2/month"]})

       id               price
    0   0     15dollar/m2/day
    1   1   90dollar/m2/month
    2   2     18dollar/m2/day
    3   3  100dollar/m2/month

我想将列price分为两个新列：unit_price和price_unit如下：

   id     unit_price  price_unit
0   0        15.0    dollar/m2/day
1   1        90.0    dollar/m2/month
2   2        18.0    dollar/m2/day
3   3       100.0    dollar/m2/month

这是我的解决方案：

df['unit_price'] = df['price'].str.split('dollar').str[0].astype(float)
#df['unit_price'] = df['price'].str.extract('(\d*\.\d+|\d+)', expand=False).astype(float)
df['price_unit'] = df['price'].str.split('dollar').str[1]
del df['price']

对于column unit_price，它工作正常，但是对于price_unit，当我除以时dollar，得到如下结果，该结果不包含character dollar，或者如果使用df['price'].str.replace(r'\d', '')，则所有数字均被删除。如何在Python中正确执行？谢谢。

df['price_unit']
Out[474]: 
0      /m2/day
1    /m2/month
2      /m2/day
3    /m2/month
Name: price_unit, dtype: object

耶斯列尔

您可以Series.str.extract与regex一起使用-^用于字符串的开头，\d*\.\d+浮点数或\d+整数，然后用于所有其他值，方法是.*：

df = df.join(df.pop('price').str.extract('(?P<unit_price>^\d*\.\d+|^\d+)(?P<price_unit>.*)'))
print (df)
   id unit_price       price_unit
0   0         15    dollar/m2/day
1   1         90  dollar/m2/month
2   2         18    dollar/m2/day
3   3        100  dollar/m2/month

第一个解决方案是使用extract和replace按数字：

pat = '(^\d*\.\d+|^\d+)'
df['unit_price'] = df['price'].str.extract(pat, expand=False)
df['price_unit'] = df.pop('price').str.replace(pat,'')
print (df)
   id unit_price       price_unit
0   0         15    dollar/m2/day
1   1         90  dollar/m2/month
2   2         18    dollar/m2/day
3   3        100  dollar/m2/month

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-01-1

我来说两句

0 条评论

登录后参与评论

上一篇：如何从XMonad中的所有浮动窗口中删除边框

提取字符串的第二部分

如何拆分字符串并删除重复并连接字符串的第二部分

拆分字符串列并在Python中提取第二部分

拆分字符串列并在Python中提取第二部分

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用