在两个文件中查找唯一的句子

Raj 发表于 Dev

拉吉

我有两个文件，我试图在两个文件之间打印唯一的句子。为此，我在python中使用difflib。

text ='Physics is one of the oldest academic disciplines. Perhaps the oldest through its inclusion of astronomy. Over the last two millennia. Physics was a part of natural philosophy along with chemistry.'
text1 ='Physics is one of the oldest academic disciplines. Physics was a part of natural philosophy along with chemistry. Quantum chemistry is a branch of chemistry.'
import difflib

differ = difflib.Differ()
diff = differ.compare(text,text1)
print '\n'.join(diff)

它没有给我想要的输出。它给我这样的。

  P
  h
  y
  s
  i
  c
  s

  i
  s

  o
  n
  e

  o
  f

  t
  h
  e

我想要的输出只是两个文件之间唯一的句子。

文字=也许是最古老的，因为它包含了天文学。在过去的两千年中。

text1 =量子化学是化学的一个分支。

也似乎difflib.Differ逐行而不是句子。请提出任何建议。我该怎么做？

迪兹

实际上，首先，Differ（）。compare（）比较行而不是句子。

其次，它实际上比较序列，例如字符串列表。但是，您传递两个字符串，而不是两个字符串列表。由于字符串也是（字符）序列，因此您的情况下Differ（）。compare（）会比较各个字符。

如果要按句子比较文件，则必须准备两个句子列表。您可以使用nltk.sent_tokenize（text）将字符串拆分为句子。

diff = differ.compare(nltk.sent_tokenize(text),nltk.sent_tokenize(text1))
print('\n'.join(diff))
#  Physics is one of the oldest academic disciplines.
#- Perhaps the oldest through its inclusion of astronomy.
#- Over the last two millennia.
#  Physics was a part of natural philosophy along with chemistry.
#+ Quantum chemistry is a branch of chemistry.

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-05-20

我来说两句

0 条评论

登录后参与评论

上一篇：如何在固定时间杀死并重新启动Python脚本？

在两个文件中查找唯一的句子

在两个文件中查找唯一的句子

Linux的官方Adobe Flash存储库是否已过时？

如何使用HttpClient的在使用SSL证书，无论多么“糟糕”是

错误：“ javac”未被识别为内部或外部命令，

Modbus Python施耐德PM5300

为什么Object.hashCode（）不遵循Java代码约定

如何正确比较 scala.xml 节点？

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

在令牌内联程序集错误之前预期为 ')'

数据表中有多个子行，asp.net核心中来自sql server的数据

VBA 自动化错误：-2147221080 (800401a8)

错误TS2365：运算符'！=='无法应用于类型'“（”'和'“）”'

如何在JavaScript中获取数组的第n个元素？

检查嵌套列表中的长度是否相同

如何将sklearn.naive_bayes与（多个）分类功能一起使用？

ValueError：尝试同时迭代两个列表时，解包的值太多（预期为 2）

ES5的代理替代

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID

如何监视应用程序而不是单个进程的CPU使用率？

如何检查字符串输入的格式

解决类Koin的实例时出错

如何自动选择正确的键盘布局？-仅具有一个键盘布局