langid.map.individual not working in solr

35 views Asked by At

I am trying to perform multiple language detections on fields in document submitted to solr. This is my solrconfigxml for language identification:

 <updateRequestProcessorChain name="langid">
   <processor class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
     <str name="langid.fl">title,description</str>
     <str name="langid.langField">language</str>
     <str name="langid.langsField">languages</str>
     <bool name="langid.map">true</bool>
     <bool name="langid.map.keepOrig">true</bool>
     <str name="langid.whitelist">cjk,ckb,ar,bg,ca,cz,da,de,el,en,es,et,eu,fa,fi,fr,ga,gl,hi,hu,hy,id,it,ja,ko,lv,nl,no,pt,ro,ru,sv,tr</str>
     <str name="langid.fallback">tg</str>
     <bool name="langid.map.individual">true</bool>
     <str name="langid.map.individual.fl">websiteKeywords,websiteDescription,websiteTitle,websiteContent</str>
   </processor>
   <processor class="solr.LogUpdateProcessorFactory" />
   <processor class="solr.RunUpdateProcessorFactory" />
 </updateRequestProcessorChain>

I detect the language of the document based on title and description field. That part works as expected as I solr creates 2 new fiels title_languagecode and description_languagecode, for example title_en and description_en. I also wanted to separately detect the language of websiteKeywords, websiteDescription, websiteTitle, websiteContent and map the fields to websiteKeywords_en, websiteDescription_en, websiteTitle_en, websiteContent_en. With current configuration, that does not happen. Solr only stores original fields, that is websiteKeywords, websiteDescription, websiteTitle, websiteContent. What am I doing wrong? Your help is much appreciated. I am using solr 8.9.0.

1

There are 1 answers

0
sanjihan On

I misunderstood what langid.map.individual and langid.map.individual.fl represent. langid.map.individual.fl is a subset of langid.fl. If the specified fields are not in langid.fl it will have no effect. langid.map.individual refers to fields in langid.fl or the subset specified in langid.map.individual.fl.