如何在固定字符串附近查找匹配项

Pro Q 发表于 Dev

Pro Q

我正在寻求帮助，以查找允许我获取字符串列表（例如["I like ", " and ", " because "]和单个目标字符串，例如）的Python函数"I like lettuce and carrots and onions because I do"，并找到可以对目标字符串中的字符进行分组的所有方式，以便每个字符串列表中的顺序。

例如：

solution(["I like ", " and ", " because ", "do"],
         "I like lettuce and carrots and onions because I do")

应该返回：

[("I like ", "lettuce", " and ", "carrots and onions", " because ", "I ", "do"), 
 ("I like ", "lettuce and carrots", " and ", "onions", " because ", "I ", "do")]

请注意，在每个元组中，list参数中的字符串都是按顺序排列的，并且该函数返回每种可能的方法来拆分目标字符串以实现此目的。

另一个例子，这次只有一种可能的组织字符的方式：

solution(["take ", " to the park"], "take Alice to the park")

应该给出结果：

[("take ", "Alice", " to the park")]

这是一个无法正确组织字符的示例：

solution(["I like ", " because ", ""],
         "I don't like cheese because I'm lactose-intolerant")

应该回馈：

[]

因为没有办法做到。请注意，"I like "第一个参数中的不能拆分。目标字符串中没有字符串"I like "，因此无法匹配。

这是最后一个示例，同样具有多个选项：

solution(["I", "want", "or", "done"],
         "I want my sandwich or I want my pizza or salad done")

应该回来

[("I", " ", "want", " my sandwich ", "or", " I want my pizza or salad ", "done"),
 ("I", " ", "want", " my sandwich or I want my pizza ", "or", " salad ", "done"),
 ("I", " want my sandwich or I", "want", " my pizza ", "or", " salad ", "done")]`

再次注意，每个字符串["I", "want", "or", "done"]按顺序包含在每个元组中，并且其余字符以任何可能的方式围绕这些字符串重新排序。返回所有可能的重新排序列表。

请注意，还假定列表中的第一个字符串将出现在目标字符串的开头，列表中的最后一个字符串将出现在目标字符串的结尾。（如果没有，该函数应返回一个空列表。）

哪些Python函数将允许我执行此操作？

我已经尝试过使用正则表达式函数，但是在有多个选项的情况下，它似乎失败了。

我有一个解决方案，它需要大量的重构，但它似乎可以工作，我希望这会有所帮助，这是一个非常有趣的问题。

import itertools
import re
from collections import deque


def solution(search_words, search_string):
    found = deque()
    for search_word in search_words:
        found.append([(m.start()) for m in re.compile(search_word).finditer(search_string)])
    if len(found) != len(search_words) or len(found) == 0:
        return []  # no search words or not all words found
    word_positions_lst = [list(i) for i in itertools.product(*found) if sorted(list(i)) == list(i)]

    ret_lst = []
    for word_positions in word_positions_lst:
        split_positions = list(itertools.chain.from_iterable(
            (split_position, split_position + len(search_word))
            for split_position, search_word in zip(word_positions, search_words)))
        last_seach_word = search_string[split_positions[-1]:]
        ret_strs = [search_string[a:b] for a, b in zip(split_positions, split_positions[1:])]
        if last_seach_word:
            ret_strs.append(last_seach_word)
        if len(search_string) == sum(map(len,ret_strs)):
            ret_lst.append(tuple(ret_strs))
    return ret_lst


print(solution(["I like ", " and ", " because ", "do"],
               "I like lettuce and carrots and onions because I do"))
print([("I like ", "lettuce", " and ", "carrots and onions", " because ", "I ", "do"),
       ("I like ", "lettuce and carrots", " and ", "onions", " because ", "I ", "do")])
print()

print(solution(["take ", " to the park"], "take Alice to the park"))
print([("take ", "Alice", " to the park")])
print()

print(solution(["I like ", " because "],
               "I don't like cheese because I'm lactose-intolerant"))
print([])
print()

输出：

[('I like ', 'lettuce', ' and ', 'carrots and onions', ' because ', 'I ', 'do'), ('I like ', 'lettuce and carrots', ' and ', 'onions', ' because ', 'I ', 'do')]
[('I like ', 'lettuce', ' and ', 'carrots and onions', ' because ', 'I ', 'do'), ('I like ', 'lettuce and carrots', ' and ', 'onions', ' because ', 'I ', 'do')]

[('take ', 'Alice', ' to the park')]
[('take ', 'Alice', ' to the park')]

[]
[]

[('I', ' ', 'want', ' my sandwich ', 'or', ' I want my pizza or salad ', 'done'), ('I', ' ', 'want', ' my sandwich or I want my pizza ', 'or', ' salad ', 'done'), ('I', ' want my sandwich or I ', 'want', ' my pizza ', 'or', ' salad ', 'done')]
[('I', ' ', 'want', ' my sandwich ', 'or', ' I want my pizza or salad ', 'done'), ('I', ' ', 'want', ' my sandwich or I want my pizza ', 'or', ' salad ', 'done'), ('I', ' want my sandwich or I', 'want', ' my pizza ', 'or', ' salad ', 'done')]

编辑：重构代码以具有有意义的变量名。

Edit2：添加了我忘记的最后一种情况。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-30

我来说两句

0 条评论

登录后参与评论

上一篇：使用bash，sed或awk将三行或更多行改为两行

如何在固定字符串附近查找匹配项

如何在固定字符串附近查找匹配项

隐藏发件人没有短信PHP

Hashchange事件侦听器在将事件处理程序附加到事件之前进行侦听

用日期数据透视表和日期顺序查询

flask-admin 如何自定义删除按钮

在浏览器中请求URL时会发生什么？

材质UI垂直滑块。如何改变在垂直材料UI滑块导轨的厚度（反应）

为什么PlusShare.Builder setRecipients方法不起作用？

OS X-为什么我需要打开WiFi才能确定最近的位置

在Windows 7中无法删除文件（2）

android 背部按下

Swift如何使用Base64Url编码JWT标头和有效负载之类的json对象

PyQt4.QtCore模块无法向sip模块注册

用白色图像隐藏Android Studio中的所有textView

为什么随机森林中的平均降低基尼系数取决于人口规模？

应用发明者仅从列表中选择一个随机项一次

正则表达式，用于查找所有以任何字母开头和数字开头的文件

ArgumentError：错误＃2109：在场景默认设置中未找到默认的帧标签

sshd AllowGroups组未授予访问权限

jQuery无限滚动固定div中的滚动

无法加载文件或程序集System.Runtime.CompilerServices.Unsafe

Jqgrid：多级别组摘要