从需要.split（'，'）的字符串列表中高效创建多维数组

roganjosh 发表于 Dev

罗根乔希

我试图将当前for循环中的简单计算推入numpy数组。在这种情况下，它是对以下形式的字符串列表的计算：

strings = ['12,34', '56,78'...]

我需要：

用逗号分隔符分割字符串，并产生两个整数，例如
strings = [[12, 34], [56, 78]...]
仅将此嵌套列表过滤为仅满足某些任意条件的成员，例如，子列表中的两个数字均在特定范围内。

我正在尝试熟悉该numpy库，但是在不增加处理初始列表开销的情况下，我无法利用改进的计算速度。例如，我的本能是在创建数组之前在Python中进行split()andint()转换，但这最终要比简单for循环昂贵。

除此之外，我似乎无法numpy在从初始列表创建的数组中组合完成此操作所需的各种操作。是否有理智的方法来执行此操作，还是因为数组仅使用一次这样的事情而导致丢失？

注：有一个旧的答案在这里那么就表明该字符串操作应在Python做，但它并不比运行时和现在也可能过时。

我的尝试比较：

import random
import datetime as dt
import numpy as np

raw_locs = [str(random.randint(1,100)) + ',' + str(random.randint(1,100)) 
            for x in xrange(100000)]

if __name__ =='__main__':

    # Python approach
    start1 = dt.datetime.now()
    results = []
    for point in raw_locs:
        lon, lat = point.split(",")
        lat = int(lat)
        lon = int(lon)
        if 0 <= lon <= 50 and 50 <= lat <= 100:
            results.append(point)
    end1 = dt.datetime.now()

    # Python list comprehension prior to numpy array
    start2 = dt.datetime.now()
    converted_list = [map(int, item.split(',')) for item in raw_locs]
    end2 = dt.datetime.now()

    # List comprehension + numpy array creation
    start3 = dt.datetime.now()
    arr = np.array([map(int, item.split(',')) for item in raw_locs])
    end3 = dt.datetime.now()

    start4 = dt.datetime.now()   
    results2 = arr[((0 <= arr[:,0]) & (arr[:,0] <= 50) 
                    & (50 <= arr[:,1]) & (arr[:,1] <= 100))]
    end4 = dt.datetime.now()

    # Print results
    print "Pure python for whole solution took:                {}".format(end1 - start1)
    print "Just python list comprehension prior to array took: {}".format(end2 - start2)
    print "Comprehension + array creation took:                {}".format(end3 - start3)
    print "Numpy actual calculation took:                      {}".format(end4 - start4)
    print "Total numpy time:                                   {}".format(end4 - start3)

安德拉斯·迪克（Andras Deak）

虽然我认为如果使用类似timeit模块的方法，计时会更精确，但我认为最大的问题是您正在解析字符串列表。Numpy的内置方法可以很好地与任何一种配合使用。请注意，在您的numpy情况下，输入的内容np.array()是包含其他内容的列表组合。

这是我的建议：将您的字符串列表加逗号以得到一个逗号分隔的字符串，用解析numpy.fromstring，然后将结果整形为两列：

arr = np.fromstring(','.join(raw_locs),sep=',').reshape(-1,2)

在笔记本电脑上添加了上述时间：

Pure python for whole solution took:                0:00:00.128965
Just python list comprehension prior to array took: 0:00:00.156092
Comprehension + array creation took:                0:00:00.186023
Join + fromstring took:                             0:00:00.035040
Numpy actual calculation took:                      0:00:00.001355
Total numpy time:                                   0:00:00.222454

请注意numpy.float64，即使您输入的是整数，上面的代码也会默认创建一个dtype数组。如果要使数组保持整数值，可以手动将dtype=np.int64关键字参数传递给fromstring。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-4

我来说两句

0 条评论

登录后参与评论

上一篇：如何使用Google Drive REST API解决Proguard问题

TOP 榜单

文章

从需要.split（'，'）的字符串列表中高效创建多维数组

从需要.split（'，'）的字符串列表中高效创建多维数组

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用