读书人

搜寻关键字拼音智能提示实现

发布时间: 2013-08-13 16:43:28 作者: rapoo

搜索关键字拼音智能提示实现

----------------------------------------------

<copyField source="address" dest="suggest" />? ?

?

----------------------------------------------

?

<fieldType name="suggest_text" positionIncrementGap="100" autoGeneratePhraseQueries="true">
??????? <analyzer type="index">
??????????????? <tokenizer />
??????????????? <filter />
??????????????? <filter />
??????????????? <filter />
??????????????? <filter protected="protwords.txt" />
??????? </analyzer>
??????? <analyzer type="query">
??????????????? <tokenizer />
??????????????? <filter />
??????????????? <filter />
??????????????? <filter protected="protwords.txt" />
??????? </analyzer>
</fieldType>

?

?

?

solrconfig.xml

<searchComponent name="suggest">
??????? <str name="queryAnalyzerFieldType">suggest_text</str>
??????? <lst name="spellchecker">
??????????????? <str name="name">suggest</str>
??????????????? <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
??????????????? <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
??????????????? <str name="field">suggest</str>
??????????????? <float name="threshold">0</float>
??????????????? <str name="buildOnCommit">true</str>
??????? </lst>
??????? <lst name="spellchecker">
??????????????? <str name="name">default</str>
??????????????? <str name="field">suggest</str>
??????????????? <str name="classname">solr.DirectSolrSpellChecker</str>
??????????????? <str name="distanceMeasure">internal</str>
??????????????? <float name="accuracy">0.2</float>
??????????????? <int name="maxEdits">2</int>
??????????????? <int name="minPrefix">1</int>
??????????????? <int name="maxInspections">50</int>
??????????????? <int name="minQueryLength">2</int>
??????????????? <float name="maxQueryFrequency">0.01</float>
??????? </lst>
??????? <lst name="spellchecker">
??????????????? <str name="name">wordbreak</str>
??????????????? <str name="classname">solr.WordBreakSolrSpellChecker</str>
??????????????? <str name="field">suggest</str>
??????????????? <str name="combineWords">true</str>
??????????????? <str name="breakWords">true</str>
??????????????? <int name="maxChanges">10</int>
??????? </lst>
</searchComponent>
????????
????????
<requestHandler name="/suggest">
??????? <lst name="defaults">
??????????????? <str name="spellcheck">true</str>
??????????????? <str name="spellcheck.dictionary">default</str>
??????????????? <str name="spellcheck.dictionary">wordbreak</str>
??????????????? <str name="spellcheck.dictionary">suggest</str>
??????????????? <str name="spellcheck.onlyMorePopular">true</str>
??????????????? <str name="spellcheck.count">10</str>
??????????????? <str name="spellcheck.collate">true</str>
??????? </lst>
??????? <arr name="components">
??????????????? <str>suggest</str>
??????? </arr>
</requestHandler>

?

配置完成后,启动solrcloud:

?

solr1:
? ? ? java?-Djetty.port=8983?-Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=1 -jar start.jar &

?

solr2:
? ? ? java -Djetty.port=7574 -DzkHost=localhost:9983??-Dcollection.configName=myconf?-DnumShards=1 -jar start.jar &?

?

?查询:

第一次查询时,添加spellcheck.build=true参数来触发spellckecker建立索引,

http://localhost:8983/solr/collection1/suggest?q=cq&spellcheck.build=true&distrib=false?

而后:

http://localhost:8983/solr/collection1/suggest?q=cq&distrib=false?

?

存在问题

该方法存在的问题是:

? ? ?返回的结果是基于索引中字段的词频进行排序,不是用户搜索关键字的频率,因此不能将一些热门关键字排在前面(网上有人提到定制SuggestWordScoreComparator来实现)

? ? ?拼音提示,多音字,缩写还是要另外加索引字段

?

?

2. Solrcloud建立单独的collection,利用solr前缀查询实现

配置如下:

solr.xml:

?

<cores adminPath="/admin/cores" defaultCoreName="collection1" host="${host:}" hostPort="${jetty.port:8983}" hostContext="${hostContext:solr}" zkClientTimeout="${zkClientTimeout:15000}">
??? <core name="collection1" instanceDir="collection1" />
??? <core name="collection2" instanceDir="collection2" />
?</cores>

?

cp -r collection1 collection2?

修改collection2中的schema.xml和solrconfig.xml

?

schema.xml:

-----------------------fields--------------------------------

? ?<field name="kw" type="string" indexed="true" stored="true" />

? ?<field name="pinyin" type="string" indexed="true" stored="false" multiValued="true"/>

? ?<field name="abbre" type="string" indexed="true" stored="false" multiValued="true"/>

? ?<field name="kwfreq" type="int" indexed="true" stored="true" />

? ?<field name="_version_" type="long" indexed="true" stored="true"/>

?

? ?<field name="suggest" type="suggest_text" indexed="true" stored="false" multiValued="true" />

------------------multiValued表示字段是多值的-------------------------------------

注:

? ? ?kw为原始关键字

? ? ?pinyin和abbre的multiValued=true,在使用solrj建此索引时,定义成集合类型即可:如关键字“重庆”的pinyin字段为{chongqing,zhongqing}, abbre字段为{cq, zq}

? ? ?kwfreq为用户搜索关键的频率,用于查询的时候排序

?

------------------uk----------------------------------

? ?<uniqueKey>kw</uniqueKey>

? ?<defaultSearchField>suggest</defaultSearchField>?

-------------------------------------------------------

? <copyField source="kw" dest="suggest" />

? <copyField source="pinyin" dest="suggest" />

? <copyField source="abbre" dest="suggest" />

?

------------------suggest_text----------------------------------

? ? <!-- suggest type-->

? ? <fieldType name="suggest_text" positionIncrementGap="100" autoGeneratePhraseQueries="true">

? ? ? ? <analyzer type="index">

? ? ? ? ? ? ? ? <tokenizer />

? ? ? ? ? ? ? ? <filter />

? ? ? ? ? ? ? ? <filter />

? ? ? ? ? ? ? ? <filter />

? ? ? ? ? ? ? ? <filter protected="protwords.txt" />

? ? ? ? </analyzer>

? ? ? ? <analyzer type="query">

? ? ? ? ? ? ? ? <tokenizer />

? ? ? ? ? ? ? ? <filter />

? ? ? ? ? ? ? ? <filter />

? ? ? ? ? ? ? ? <filter protected="protwords.txt" />

? ? ? ? </analyzer>

? ? </fieldType>

?

?

启动命令:

solr1:
java -Xms1024m -Xmx1024m -Djetty.port=8983 -Dbootstrap_conf=true?-DzkHost=localhost:9983?-DnumShards=1 -jar start.jar &

solr2:
java -Djetty.port=7574?-DzkHost=localhost:9983??-DnumShards=1 -jar start.jar &?

?

?

solrj前缀查询语法:

?

? ? private SolrQuery getSuggestQuery(String prefix, Integer limit) {

? ? ? ? SolrQuery solrQuery = new SolrQuery();

? ? ? ? StringBuilder sb = new StringBuilder();

? ? ? ? sb.append("kw:").append(prefix).append("*");

? ? ? ? sb.append(" or pinyin:").append(prefix).append("*");

? ? ? ? sb.append(" or abbre:").append(prefix).append("*");

? ? ? ? solrQuery.setQuery(sb.toString());

?

? ? ? ? solrQuery.addField("kw");

? ? ? ? solrQuery.addField("kwfreq");

? ? ? ? solrQuery.addSort("kwfreq", SolrQuery.ORDER.desc);

? ? ? ? solrQuery.setStart(0);

? ? ? ? solrQuery.setRows(limit);

?

? ? ? ? return solrQuery;

? ? }

?

3. mongodb实现拼音智能提示

? ? 实现原理类似于方案2,利用mongodb强调的查询功能,数组和正则匹配。参考http://www.2cto.com/database/201203/123450.html

? ? 经过测试,效果与方案2类似。

?

四、总结

? ? 以上三个方案都能实现搜索关键字智能提示,方案1还需要研究完善才能实现基于用户搜索频率热词排序,方案3需要另外部署mongodb服务,不利用维护,所以本人在实际项目中采用了方案2.

?

?

?

?

?

?

?

?

?

?

读书人网 >开源软件

热点推荐