Python - 比较两个列表时出现循环问题

测试用户

我有一个小问题，我试图将 2 个列表与其中的单词进行比较以建立相似度百分比，但问题是，如果我在每个列表中两次使用相同的单词，我会得到一个错误的百分比。

首先我做了这个小脚本：

data1 = ['test', 'super', 'class', 'test', 'boom']
data2 = ['test', 'super', 'class', 'test', 'boom']
res = 0
nb = (len(data1) + len(data2)) / 2
if data1 and data2 and nb != 0:
    for id1, item1 in enumerate(data1):
        for id2, item2 in enumerate(data2):
            if item1 == item2:
                res += 1 - abs(id1 - id2) / nb
    print(res / nb * 100)

问题是，如果我在列表中有 2 次相同的单词，则百分比将大于 100%。为了解决这个问题，我在 'res += 1 - abs(id1 - id2) / nb' 行之后添加了一个 'break'，但百分比仍然是伪造的。

我希望你明白我的问题，谢谢你的帮助！

阿尔基斯塔夫·克尔佐恩斯泰夫

您可以difflib.SequenceMatcher改为使用来比较两个列表的相似性。试试这个：

from difflib import SequenceMatcher as sm
data1 = ['test', 'super', 'class', 'test', 'boom']
data2 = ['test', 'super', 'class', 'test', 'boom']
matching_percentage = sm(None, data1, data2).ratio() * 100

输出：