函数式编程，如何从一个迭代器一次高效地构建多个列表？

Bill Huneke 发表于 Dev

比尔·洪内克

我对使用Python有点陌生...

情景

这是普遍的问题。假设我只想读一次输入。假设它确实很大。也许我有很多过滤器，转换，归约方法，无论在此流上做什么。最后，我想生成一个大小适中的列表，并将其作为分析结果传递给其他对象。

如果我想创建一个列表，则状态良好。我将上述逻辑编码为可迭代的操作，并将其提供给filter（）等工具管道。这将使列表理解有效地构建结果列表。

但是，如果我的要求不高，我想要两个列表作为输出怎么办？例如，我想要一个列表中的所有（某些问题）“假”，而另一个列表中的所有“真”。也许我想要三个清单...

解决方案不足

在这种情况下，我有两个选择：

迭代我的输入以生成第一个列表，保存该输出，然后再次迭代以生成第二个输出列表
创建两个空列表，手动迭代管道的输出，并在每一步根据需要添加到列表中

与列表理解相比，这两个选项都令人作呕。一个执行多次通过，第二个反复调用append（），这很慢（我想我认为是这样），它是一个驻留在任意python中的构造，而不是干净，可优化的单个语句。

现有模块？

我浏览了模块itertools和collections，并对numpy有所了解。我看到了一些可以完成上述操作的事情，但是他们的文档解释说它们是一种便利功能，会导致缓冲等问题，因此它们不符合我的要求。

我喜欢Python函数样式，迭代器和生成器。我觉得我对迭代器的好处有了很好的了解，即使它们与不是文件的输入有关。我感谢同时读取多个迭代器可能引起的细微困难（例如，缓冲），其中一些可能是“慢速输入”而另一些可能是“快速输入”。

就我而言，我只想消耗1个迭代器。在过去的几年中，这种情况对我而言已经出现过多次。

总结一个例子

# python 3
# Toy example. Just for reading, not worth running
import random
import itertools


num_samples = 1000000
least_favorite_number = 98


def source(count):
    for _ in range(count):
        yield random.randint(1, 100)


def my_functional_process(stream):
    """ Do silly things to an input iterable of ints, return an iterable of int pairs"""
    # Remove the hated number
    stream = itertools.filterfalse(lambda x: x == least_favorite_number, stream)

    # For each number, take note of which number preceded it in the stream
    def note_ancestor(l):
        prec = None
        for x in l:
            yield x, prec
            prec = x

    stream = note_ancestor(stream)

    # I don't like it even when you and your ancestor add up to our
    # least favorite number or if you have no ancestor
    stream = itertools.filterfalse(
        lambda x: x[1] is None or x[0] + x[1] == least_favorite_number,
        stream
    )

    # Good job
    return stream


def single_pass_the_slow_way():
    """
    Read through the iterator in a single pass, but build result in a way that I think is slow
    """
    the_fours = []
    not_fours = []

    stream = source(num_samples)
    processed = my_functional_process(stream)

    for x in processed:
        if x[0] == 4:
            the_fours.append(x)
        else:
            not_fours.append(x)

    return the_fours, not_fours


def single_pass_and_fast():
    """
    In this function, we make a single pass but create multiple lists using
    imaginary syntax.
    """
    stream = source(num_samples)
    processed = my_functional_process(stream)

    # In my dream, Python figures out to run these comprehensions in parallel
    # In reality, is there even a syntax to represent this?? Obviously, the
    # below does not do it
    not_real_code = [
        # just making up syntax here
        #        [x for x in ~x~ if x == 4],
        #        [x for x in ~x~ if x != 4]
        x for x in processed
    ]

    # These should be a list of fours, and all others respectively
    return not_real_code[0], not_real_code[1]


i_want_it = 'slow'

if i_want_it == 'slow':
    fours, others = single_pass_the_slow_way()
    print("We're done. ready to use those lists")
else:
    fours, others = single_pass_and_fast()
    print("We're done a bit faster. ready to use those lists")

布鲁佐西

我前段时间也遇到过类似的问题。我在这里的某个地方找到了答案，然后在代码中使用了它们。我找不到原始问题，如果有人找到了这样的链接，请在下面对其进行评论，以便将其集成到此答案中。

有两种方法可以做到这一点：

'''
Let p1 be a function that checks if x has the property to belong to lst1
Let s be the list/iterator you want to iterate through
''' 

# 1st way - one loop

lst1, lst2 = [], []
for x in s:

    target = lst1 if p1(x) else lst2
    target.append(x)

# 2nd way - one list comprehension (Not recommended)

lst1, lst2 = [[x for x in cur_list if x is not None]\
               for cur_list in zip(*[(x,None) if p1(x) else (None,x)\
                                     for x in s])]

现在，我认为您正在寻找速度，因此让我们通过一个玩具示例检查哪个速度更快（无论如何，您都可以使用实际代码进行检查）：

import timeit
code1 = """
        lst1, lst2 = [], []

        for x in range(1_000_000):
            target = lst1 if x%3 else lst2
            target.append(x)
        """
elapsed_time1 = timeit.timeit(code1, number=100)/100
print(elapsed_time1)

code2 = """
        lst1, lst2 = [[x for x in cur_list if x is not None]\
            for cur_list in zip(*[(x,None) if x%3 else (None,x)\
                                  for x in range(1_000_000)])]
        """
elapsed_time2 = timeit.timeit(code2, number=100)/100
print(elapsed_time2)

结果

0.307000948000001
0.779959973

这向我们表明，.append如注释中所述，单循环使用方法更快。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-01-22

我来说两句

0 条评论

登录后参与评论

上一篇：python和pandas在两个日期索引值之间的绘图

TOP 榜单

文章

函数式编程，如何从一个迭代器一次高效地构建多个列表？

函数式编程，如何从一个迭代器一次高效地构建多个列表？

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用