如何使用Hibernate Lucene Search进行不区分大小写的挪威字符(Æ,Ø和Å)排序?

父亲

æ,ø,å是挪威字母中的最新字母

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Æ Ø Å

当我们试图使用Hibernate的Lucene然后对它进行排序Å clubs with AØ clubs with ØÆ clibs with A这是不对的。例如:

当前结果:

Aaalu,Åaalu,Baalu,Zaalu,

预期成绩:

Aaalu,Baalu,Zaalu,aluaalu,

以下是工作代码:

@AnalyzerDef(name = "myOwnAnalyzer",
tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
filters = {
    @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
    @TokenFilterDef(factory = LowerCaseFilterFactory.class),
    @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
        @Parameter(name = "pattern", value = "('-&\\.,\\(\\))"),
        @Parameter(name = "replacement", value = " "),
        @Parameter(name = "replace", value = "all")
    }),
    @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
        @Parameter(name = "pattern", value = "([^0-9\\p{L} ])"),
        @Parameter(name = "replacement", value = ""),
        @Parameter(name = "replace", value = "all")
    }),
    @TokenFilterDef(factory = TrimFilterFactory.class)
}
)
public class KikaPaya implements Serializable {

@Fields({ @Field(index = Index.YES, store = Store.YES), @Field(name = "KikaPayaName_for_sort", index = Index.YES, analyzer = @Analyzer(definition = "myOwnAnalyzer")) })
@Column(name = "NAME", length = 100)
private String name;

主要:

  FullTextEntityManager ftem = Search.getFullTextEntityManager(factory.createEntityManager());
  QueryBuilder qb = ftem.getSearchFactory().buildQueryBuilder().forEntity( KikaPaya.class ).get();
  org.apache.lucene.search.Query query = qb.all().getQuery(); 
  FullTextQuery fullTextQuery = ftem.createFullTextQuery(query, KikaPaya.class);
  fullTextQuery.setSort(new Sort(new SortField("KikaPayaName_for_sort", SortField.STRING, true)));
  fullTextQuery.setFirstResult(0).setMaxResults(150);
  int size = fullTextQuery.getResultSize();
  List<KikaPaya> result = fullTextQuery.getResultList();
  for (KikaPayauser : result) {
    logger.info("KikaPaya Name:" + user.getName());
  }

以下是Lucene的版本(我无法更改):

 <hibernate.version>4.2.8.Final</hibernate.version>
    <hibernate.search.version>4.3.0.Final</hibernate.search.version>

  <dependency>
        <groupId>org.hibernate</groupId>
        <artifactId>hibernate-entitymanager</artifactId>
        <version>4.2.8.Final</version>
    </dependency>
<dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-core</artifactId>
        <version>3.6.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-analyzers</artifactId>
        <version>3.6.2</version>
    </dependency>

有人可以建议获得正确结果的方法吗?

马克西姆·佩切努克

您可以org.apache.lucene.collation.CollationKeyFilter在Hibernate Search版本4.3.0.Final中使用类。创建自己的归类过滤器工厂:

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.collation.CollationKeyFilter;
import org.apache.solr.analysis.BaseTokenFilterFactory;

import java.text.Collator;
import java.util.Locale;

public final class NorwegianCollationFactory extends BaseTokenFilterFactory {

    @Override
    public TokenStream create(TokenStream input) {
        Collator norwegianCollator = Collator.getInstance(new Locale("no", "NO"));
        return new CollationKeyFilter(input, norwegianCollator);
    }

}

并在AnalyzerDef中使用以下整理工厂:

@AnalyzerDef(name = "myOwnAnalyzer",
tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
filters = {
    @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
    @TokenFilterDef(factory = LowerCaseFilterFactory.class),
    @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
        @Parameter(name = "pattern", value = "('-&\\.,\\(\\))"),
        @Parameter(name = "replacement", value = " "),
        @Parameter(name = "replace", value = "all")
    }),
    @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
        @Parameter(name = "pattern", value = "([^0-9\\p{L} ])"),
        @Parameter(name = "replacement", value = ""),
        @Parameter(name = "replace", value = "all")
    }),
    @TokenFilterDef(factory = TrimFilterFactory.class)
,
    @TokenFilterDef(factory = NorwegianCollationFactory .class)
}
)
public class KikaPaya implements Serializable {

有关在冬眠搜索版本5中使用此归类过滤器的更多信息-https : //stackoverflow.com/a/60738067/7179509

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

Lodash:如何使用orderBy对集合进行不区分大小写的排序?

使用查询的Cloud Firestore不区分大小写排序

如何进行不区分大小写的字符串比较?

使用休眠条件的不区分大小写的排序

如何使Lucene不区分大小写

使用Google Guava进行不区分大小写的排序

Lucene区分大小写和不区分大小写的搜索

如何在Golang中使用sort.Strings()进行不区分大小写的排序?

PostgreSQL:如何进行“不区分大小写”查询

如何比较不区分大小写和不区分重音的字符串

如何进行不区分大小写的字典排序并将其存储在OrderedDict中

如何进行不区分大小写的字符串比较?

使用Hibernate Search / Lucene标记云?

我可以使用Order By和ToLower在DocumentDB上执行不区分大小写的字符串排序吗?

如何进行不区分大小写的搜索?

如何以不区分大小写的顺序对对象列表进行排序?

如何按领域对不区分大小写的邮件进行排序?

Django(DRF):如何进行不区分大小写的排序

PagingAndSortingRepository如何排序不区分大小写?

如何进行不区分大小写的graphql查询?

使用qsort编写不区分大小写的排序

如何在不区分大小写的情况下不区分大小写进行排序

如何进行不区分大小写的搜索?

Zend Search Lucene不区分大小写的搜索不起作用

在Ruby on Rails和PostgreSQL中使用IN运算符进行不区分大小写的搜索

使用Firebase orderByChild的不区分大小写的排序

使用 $regex 进行 Mongodb 不区分大小写的搜索

使用 ramda sortWith 进行不区分大小写的排序

使用 Morphia (Java) 进行不区分大小写的排序