具有以下 pd.DataFrame
pd.DataFrame({'2010':[0, 45, 5], '2011': [12, 56, 0], '2012': [11, 22, 0], '2013': [0, 5, 0], '2014': [0, 0, 0]})
2010 2011 2012 2013 2014
1 0 12 11 0 0
2 45 56 22 5 0
3 5 0 0 0 0
我想计算连续零
1 [1, 2]
2 [1]
3 [4]
寻找不同的有效方式
为了提高效率,我建议您采用纯粹的NumPy方法-
def islandlen_perrow(df, trigger_val=0):
a=df.values==trigger_val
pad = np.zeros((a.shape[0],1),dtype=bool)
mask = np.hstack((pad, a, pad))
mask_step = mask[:,1:] != mask[:,:-1]
idx = np.flatnonzero(mask_step)
island_lens = idx[1::2] - idx[::2]
n_islands_perrow = mask_step.sum(1)//2
out = np.split(island_lens,n_islands_perrow[:-1].cumsum())
return out
样品运行-
In [69]: df
Out[69]:
2010 2011 2012 2013 2014
0 0 12 11 0 0
1 45 56 22 5 0
2 5 0 0 0 0
In [70]: islandlen_perrow(df, trigger_val=0)
Out[70]: [array([1, 2], dtype=int64), array([1], dtype=int64), array([4], dtype=int64)]
In [76]: pd.Series(islandlen_perrow(df, trigger_val=0))
Out[76]:
0 [1, 2]
1 [1]
2 [4]
dtype: object
大型阵列上的时序-
In [77]: df = pd.DataFrame(np.random.randint(0,4,(1000,1000)))
In [78]: from itertools import groupby
# @Daniel Mesejo's soln
In [79]: def count_zeros(x):
...: return [sum(1 for _ in group) for key, group in groupby(x, key=lambda i: i == 0) if key]
In [80]: %timeit df.apply(count_zeros, axis=1)
1 loop, best of 3: 228 ms per loop
# @coldspeed's soln-1
In [84]: %%timeit
...: v = df.stack()
...: m = v.eq(0)
...:
...: (m.ne(m.shift())
...: .cumsum()
...: .where(m)
...: .dropna()
...: .groupby(level=0)
...: .apply(lambda x: x.value_counts(sort=False).tolist()))
1 loop, best of 3: 516 ms per loop
# @coldspeed's soln-2
In [88]: %%timeit
...: v = df.stack()
...: m = v.eq(0)
...: (m.ne(m.shift())
...: .cumsum()
...: .where(m)
...: .dropna()
...: .groupby(level=0)
...: .value_counts(sort=False)
...: .groupby(level=0)
...: .apply(list))
1 loop, best of 3: 343 ms per loop
# @jpp's soln
In [90]: %timeit [[len(list(grp)) for flag, grp in groupby(row, key=bool) if not flag] \
...: for row in df.values]
1 loop, best of 3: 334 ms per loop
# @J. Doe's soln
In [94]: %%timeit
...: data = df
...: data_transformed = np.equal(data.astype(int).values.tolist(), 0).astype(str)
...: pd.DataFrame(data_transformed).apply(lambda x: [i.count('True') for i in ''.join(list(x)).split('False') if i], axis=1)
1 loop, best of 3: 519 ms per loop
# From this post
In [89]: %timeit pd.Series(islandlen_perrow(df, trigger_val=0))
100 loops, best of 3: 9.8 ms per loop
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句