我有一个列表列表(我们称之为 IDlist):我想要做的是删除 IDlist 的元素(列表),它们是 IDlist 的其他元素(其他列表)的“子字符串”。
没有必要使用列表,如果更简单,Pandas 对象也很好。
我想出的唯一方法只是部分工作(仅在特定情况下),因此它们是无用的。我真的不知道如何使列表对“本身”起作用。
这是数据集的一部分。例如,第 61、62、63、64 行。61,62 和 64 是 63 的子串,所以我应该只保留第 63 行。
56 ['2588446634610274688', '2588446634612110336']
57 ['348020242217448576', '348020448377061376', '348020482735930112']
58 ['565983471644073472', '565989347158652288']
59 ['4912580642524184960', '4912898156569562624']
60 ['318121222523445376', '318121256883850112']
61 ['356731363606425856', '357478894075788928', '357479272034582528']
62 ['356731363606425856', '357478894075788928', '357479272034582528']
63 ['356731363606425856', '356731363608936576', '357478894075788928', '357479272034582528']
64 ['356731363606425856', '356731363608936576', '357478894075788928']
65 ['2512629230496996992', '2512629230497166848']
打印命令输出:
>>> print(templist)
[['318121222523445376', '318121256883850112'], ['356731363606425856', '357478894075788928', '357479272034582528'], ['356731363606425856', '357478894075788928', '357479272034582528'], ['356731363606425856', '356731363608936576', '357478894075788928', '357479272034582528'], ['356731363606425856', '356731363608936576', '357478894075788928'], ['2512629230496996992', '2512629230497166848']]
我找到的唯一解决方案是使用嵌套循环遍历 IDlist 并从 IDlist 的副本中弹出子集列表
IDlist = [['2588446634610274688', '2588446634612110336'],
['348020242217448576', '348020448377061376', '348020482735930112'],
['565983471644073472', '565989347158652288'],
['4912580642524184960', '4912898156569562624'],
['318121222523445376', '318121256883850112'],
['318121222523445376', '318121256883850112'],
['356731363606425856', '357478894075788928', '357479272034582528'],
['356731363606425856', '357478894075788928', '357479272034582528'],
['356731363606425856', '356731363608936576', '357478894075788928', '357479272034582528'],
['356731363606425856', '356731363608936576', '357478894075788928'],
['2512629230496996992', '2512629230497166848'], ]
def is_subset(a, b):
for i in a:
if i not in b:
return False
return True
new_IDlist = IDlist.copy()
for id_j, j in enumerate(IDlist):
for id_k, k in enumerate(IDlist):
if id_k == id_j:
continue
if len(k) < len(j):
if is_subset(k, j):
for _, l in enumerate(new_IDlist):
if k == l:
new_IDlist.pop(_)
break
else:
if is_subset(j, k):
cnt = 0
for _, l in enumerate(new_IDlist):
if k == l:
if cnt:
new_IDlist.pop(_)
else:
cnt += 1
输出
['2588446634610274688', '2588446634612110336']
['348020242217448576', '348020448377061376', '348020482735930112']
['565983471644073472', '565989347158652288']
['4912580642524184960', '4912898156569562624']
['318121222523445376', '318121256883850112']
['356731363606425856', '356731363608936576', '357478894075788928', '357479272034582528']
['2512629230496996992', '2512629230497166848']
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句