Francesco Francesco - 1 month ago 13
PHP Question

SOLR issue with words containing dash, hypens etc

for some reason my SOLR installation acts wonky (im also a newbie fo this topic..)

example:
in my DB i have an item named "Brandname XX-7 Yadda Ladida"

if i search:


Brandname XX7

I don't get the item on the results (first 20) at all

Brandname XX-7
I get the expected result in 8th position;
first position is taken by item "Brandname XX-2 Yadda Ladida"

Brandname XX-7 Ladida
I get the expected result in 7th position;
first position is taken by item "Brandname XX-2 Yadda Ladida"

Brandname XX-7 Yadda Ladida
I get the expected result AGAIN in 7th position;
first position is taken by item "Brandname XX-2 Yadda Ladida"


PS. eveything is case insensitive

what am I doing wrong???
please advise..

this is my managed-schema xml file
http://pastebin.com/Z9nc36QD

UPDATE
this is an example query searching for "boss dd-7"

"debug":{
"rawquerystring":"Brandname xx-7",
"querystring":"Brandname xx-7",
"parsedquery":"_text_:Brandname (_text_:xx _text_:7)",
"parsedquery_toString":"_text_:Brandname (_text_:xx _text_:7)",

Answer

ok, not it works by simply removing this line in my schema

<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="25" />

and adding

             <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />

my final code (consider my field is text_general)

    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
          <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>

  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
        <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>