用特殊字符搜索solr

辛吉杰夫

我在solr中搜索特殊字符时遇到问题。我的文档有一个“标题”字段，有时它可能像“ Titanic-1999”（它的字符为“-”）。当我尝试使用“-”在solr中搜索时，出现400错误。我尝试转义字符，所以尝试了类似“-”和“ \-”的操作。有了这些更改，solr不会以错误回应我，但它返回0个结果。

如何在具有特殊字符（例如“-”或“'”之类的特殊字符）的solr admin中搜索？

问候

更新在这里您可以看到我当前的Solr方案https://gist.github.com/cpalomaresbazuca/6269375

我的搜索是在“标题”字段中。

来自schema.xml的摘录：

 ...
 <!-- A general text field that has reasonable, generic
     cross-language defaults: it tokenizes with StandardTokenizer,
     removes stop words from case-insensitive "stopwords.txt"
     (empty by default), and down cases.  At query time only, it
     also applies synonyms. -->
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
            <!-- in this example, we will only use synonyms at query time
             <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
             -->
            <filter class="solr.LowerCaseFilterFactory"/>

        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>

        </analyzer>
    </fieldType>
...
<field name="Title" type="text_general" indexed="true" stored="true"/>

希尔斯

您正在text_general为标题属性使用标准字段。这可能不是一个好选择。text_general意味着要用于大量文本（或至少是句子），而不能用于名称或标题的精确匹配。

这里的问题是text_general使用StandardTokenizerFactory。

 <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
            <!-- in this example, we will only use synonyms at query time
             <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
             -->
            <filter class="solr.LowerCaseFilterFactory"/>

        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>

        </analyzer>
    </fieldType>

StandardTokenizerFactory 执行以下操作：

一个好的通用令牌生成器，可以剥离许多多余的字符并将令牌类型设置为有意义的值。令牌类型仅对识别相同令牌类型的后续令牌过滤器有用。

这意味着'-'字符将被完全忽略，并用于标记字符串。

“ kong-fu”将分别表示为“ kong”和“ fu”。“-”消失。

这也解释了为什么select?q=title:\-在这里不起作用。

选择一个更合适的字段类型：

StandardTokenizerFactory可以使用代替solr.WhitespaceTokenizerFactory，而只在空白处拆分以精确匹配单词。因此，为title属性创建自己的字段类型将是一个解决方案。

Solr还具有一个称为的最小值字段类型text_ws。根据您的要求，这可能就足够了。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-03-6

我来说两句

0 条评论

登录后参与评论

TOP 榜单

文章

用特殊字符搜索solr

用特殊字符搜索solr

Android Studio Kotlin：提取为常量

计算数据帧R中的字符串频率

如何使用Redux-Toolkit重置Redux Store

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

如何使用tweepy流式传输来自指定用户的推文（仅在该用户发布推文时流式传输）

TreeMap中的自定义排序

TYPO3：将 Formhandler 添加到新闻扩展

遍历元素数组以每X秒在浏览器上显示

在Ubuntu和Windows中，触摸板有时会滞后。硬件问题？

警告消息：在matrix（unlist（drop.item），ncol = 10，byrow = TRUE）中：数据长度[16]不是列数的倍数[10]>？

无法连接网络并在Ubuntu 14.04中找到eth0

将辅助轴原点与主要轴对齐

我可以ping IPv6但不能ping IPv4

在Jenkins服务器中使用Selenium和Ruby进行的黄瓜测试失败，但在本地计算机中通过

提交html表单时为空

使用C ++ 11将数组设置为零

如果从DB接收到的值为空，则JMeter JDBC调用将返回该值作为参数名称

尝试在Dell XPS13 9360上安装Windows 7时出错

如何在R中转置数据

无法使用 envoy 访问 .ssh/config

未捕获的SyntaxError：带有Ajax帖子的意外令牌u