如何检查numpy数组的所有元素是否在另一个numpy数组中

doodoroma 发表于 Dev

114

异味瘤

我有两个2D numpy数组，例如：

A = numpy.array([[1, 2, 4, 8], [16, 32, 32, 8], [64, 32, 16, 8]])

和

B = numpy.array([[1, 2], [32, 32]])

我希望所有行都A可以从中找到任何行的所有元素B。如果一行中有2个相同的元素B，则from的行也A必须至少包含2个。以我的示例为例，我想实现以下目标：

A_filtered = [[1, 2, 4, 8], [16, 32, 32, 8]]

我可以控制值的表示形式，因此我选择了二进制表示形式仅占一个位置的数字1（例如：0b00000001和0b00000010等）。这样，通过使用np.logical_or.reduce()函数，我可以轻松地检查是否所有类型的值都在行中，但是我无法检查连续一行中相同元素的数量是否大于或等于A。我真的希望我可以避免简单的for循环和数组的深拷贝，因为性能对我来说是非常重要的方面。

如何以有效的方式在numpy中执行此操作？

更新：

这里的解决方案可能有效，但是我认为性能对我来说是一个很大的问题，它A可能真的很大（> 300000行），并且B可能中等（> 30）：

[set(row).issuperset(hand) for row in A.tolist() for hand in B.tolist()]

更新2：

该set()解决方案无法正常工作，因为会set()丢弃所有重复的值。

max9111

希望我能正确回答你的问题。至少它可以解决您在问题中描述的问题。如果输出的顺序应与输入的顺序相同，请更改就地排序。

该代码看起来很丑陋，但是应该表现良好，并且不难理解。

码

import time
import numba as nb
import numpy as np

@nb.njit(fastmath=True,parallel=True)
def filter(A,B):
  iFilter=np.zeros(A.shape[0],dtype=nb.bool_)

  for i in nb.prange(A.shape[0]):
    break_loop=False

    for j in range(B.shape[0]):
      ind_to_B=0
      for k in range(A.shape[1]):
        if A[i,k]==B[j,ind_to_B]:
          ind_to_B+=1

        if ind_to_B==B.shape[1]:
          iFilter[i]=True
          break_loop=True
          break

      if break_loop==True:
        break

  return A[iFilter,:]

衡量绩效

####First call has some compilation overhead####
A=np.random.randint(low=0, high=60, size=300_000*4).reshape(300_000,4)
B=np.random.randint(low=0, high=60, size=30*2).reshape(30,2)

t1=time.time()
#At first sort the arrays
A.sort()
B.sort()
A_filtered=filter(A,B)
print(time.time()-t1)

####Let's measure the second call too####
A=np.random.randint(low=0, high=60, size=300_000*4).reshape(300_000,4)
B=np.random.randint(low=0, high=60, size=30*2).reshape(30,2)

t1=time.time()
#At first sort the arrays
A.sort()
B.sort()
A_filtered=filter(A,B)
print(time.time()-t1)

结果

46ms after the first run on a dual-core Notebook (sorting included)
32ms (sorting excluded)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-24

我来说两句

0 条评论

登录后参与评论

上一篇：删除字符串（JS）中的Unicode字符

如何检查numpy数组的所有元素是否在另一个numpy数组中

如何检查numpy数组的所有元素是否在另一个numpy数组中

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Java Eclipse中的错误13，如何解决？

在Windows 7中无法删除文件（2）

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

套接字无法检测到断开连接

带有错误“ where”条件的查询如何返回结果？

有什么解决方案可以将android设备用作Cast Receiver？

Mac OS X更新后的GRUB 2问题

ggplot：对齐多个分面图-所有大小不同的分面

验证REST API参数

如何从视图一次更新多行（ASP.NET - Core）

尝试反复更改屏幕上按钮的位置 - kotlin android studio

计算数据帧中每行的NA

检索角度选择div的当前值

离子动态工具栏背景色

UITableView的项目向下滚动后更改颜色，然后快速备份

VB.net将2条特定行导出到DataGridView

蓝屏死机没有修复解决方案

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException