solr英文应用的基本分词器和过滤器配置
英文应用分词器和过滤器一般配置顺序
? 索引(index):
??? 1:空格 WhitespaceTokenizer
??? 2:过滤词(停用词,如:on、of、a、an等) StopFilter
??? 3:拆字WordDelimiterFilter
??? 4:小写过滤LowerCaseFilter
??? 5:英文相近词EnglishPorterFilter
??? 6:去除重复词RemoveDuplicatesTokenFilter
? 查询(query):(首先也是加入分词方法)
??? 1:查询同义词?SynonymFilter
??? 2:过滤词 StopFilter
??? 3:拆字 WordDelimiter
??? 4:小写过滤 LowerCaseFilter
??? 5:英文相近词 EnglishPorterFilter
??? 6:去除重复词 RemoveDuplicatesTokenFilter
?
示例配置如下:
<fieldtype name="text" positionIncrementGap="100"><analyzer type="index"><tokenizer ignoreCase="true" words="stopwords.txt"/><filter generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/><filter protected="protwords.txt"/><filter synonyms="synonyms.txt" ignoreCase="true" expand="true"/><filter ignoreCase="true" words="stopwords.txt"/><filter generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/><filter protected="protwords.txt"/><filter name="code"><field name="name" type="text" indexed="true" stored="true" multiValued="true"/>
?更多的过滤器配置可以参照solr wiki:http://wiki.apache.org/solr/FrontPage
?