我对使用Python有点陌生...
情景
这是普遍的问题。假设我只想读一次输入。假设它确实很大。也许我有很多过滤器,转换,归约方法,无论在此流上做什么。最后,我想生成一个大小适中的列表,并将其作为分析结果传递给其他对象。
如果我想创建一个列表,则状态良好。我将上述逻辑编码为可迭代的操作,并将其提供给filter()等工具管道。这将使列表理解有效地构建结果列表。
但是,如果我的要求不高,我想要两个列表作为输出怎么办?例如,我想要一个列表中的所有(某些问题)“假”,而另一个列表中的所有“真”。也许我想要三个清单...
解决方案不足
在这种情况下,我有两个选择:
与列表理解相比,这两个选项都令人作呕。一个执行多次通过,第二个反复调用append(),这很慢(我想我认为是这样),它是一个驻留在任意python中的构造,而不是干净,可优化的单个语句。
现有模块?
我浏览了模块itertools和collections,并对numpy有所了解。我看到了一些可以完成上述操作的事情,但是他们的文档解释说它们是一种便利功能,会导致缓冲等问题,因此它们不符合我的要求。
我喜欢Python函数样式,迭代器和生成器。我觉得我对迭代器的好处有了很好的了解,即使它们与不是文件的输入有关。我感谢同时读取多个迭代器可能引起的细微困难(例如,缓冲),其中一些可能是“慢速输入”而另一些可能是“快速输入”。
就我而言,我只想消耗1个迭代器。在过去的几年中,这种情况对我而言已经出现过多次。
总结一个例子
# python 3
# Toy example. Just for reading, not worth running
import random
import itertools
num_samples = 1000000
least_favorite_number = 98
def source(count):
for _ in range(count):
yield random.randint(1, 100)
def my_functional_process(stream):
""" Do silly things to an input iterable of ints, return an iterable of int pairs"""
# Remove the hated number
stream = itertools.filterfalse(lambda x: x == least_favorite_number, stream)
# For each number, take note of which number preceded it in the stream
def note_ancestor(l):
prec = None
for x in l:
yield x, prec
prec = x
stream = note_ancestor(stream)
# I don't like it even when you and your ancestor add up to our
# least favorite number or if you have no ancestor
stream = itertools.filterfalse(
lambda x: x[1] is None or x[0] + x[1] == least_favorite_number,
stream
)
# Good job
return stream
def single_pass_the_slow_way():
"""
Read through the iterator in a single pass, but build result in a way that I think is slow
"""
the_fours = []
not_fours = []
stream = source(num_samples)
processed = my_functional_process(stream)
for x in processed:
if x[0] == 4:
the_fours.append(x)
else:
not_fours.append(x)
return the_fours, not_fours
def single_pass_and_fast():
"""
In this function, we make a single pass but create multiple lists using
imaginary syntax.
"""
stream = source(num_samples)
processed = my_functional_process(stream)
# In my dream, Python figures out to run these comprehensions in parallel
# In reality, is there even a syntax to represent this?? Obviously, the
# below does not do it
not_real_code = [
# just making up syntax here
# [x for x in ~x~ if x == 4],
# [x for x in ~x~ if x != 4]
x for x in processed
]
# These should be a list of fours, and all others respectively
return not_real_code[0], not_real_code[1]
i_want_it = 'slow'
if i_want_it == 'slow':
fours, others = single_pass_the_slow_way()
print("We're done. ready to use those lists")
else:
fours, others = single_pass_and_fast()
print("We're done a bit faster. ready to use those lists")
我前段时间也遇到过类似的问题。我在这里的某个地方找到了答案,然后在代码中使用了它们。我找不到原始问题,如果有人找到了这样的链接,请在下面对其进行评论,以便将其集成到此答案中。
有两种方法可以做到这一点:
'''
Let p1 be a function that checks if x has the property to belong to lst1
Let s be the list/iterator you want to iterate through
'''
# 1st way - one loop
lst1, lst2 = [], []
for x in s:
target = lst1 if p1(x) else lst2
target.append(x)
# 2nd way - one list comprehension (Not recommended)
lst1, lst2 = [[x for x in cur_list if x is not None]\
for cur_list in zip(*[(x,None) if p1(x) else (None,x)\
for x in s])]
现在,我认为您正在寻找速度,因此让我们通过一个玩具示例检查哪个速度更快(无论如何,您都可以使用实际代码进行检查):
import timeit
code1 = """
lst1, lst2 = [], []
for x in range(1_000_000):
target = lst1 if x%3 else lst2
target.append(x)
"""
elapsed_time1 = timeit.timeit(code1, number=100)/100
print(elapsed_time1)
code2 = """
lst1, lst2 = [[x for x in cur_list if x is not None]\
for cur_list in zip(*[(x,None) if x%3 else (None,x)\
for x in range(1_000_000)])]
"""
elapsed_time2 = timeit.timeit(code2, number=100)/100
print(elapsed_time2)
结果
0.307000948000001
0.779959973
这向我们表明,.append
如注释中所述,单循环使用方法更快。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句