Aricie
The DNN Expert for your web project
DNN Community Forums

Content with localized characters returns no matches

Sort:
You are not authorized to post a reply.
Author
Messages


errormzyt











New Member



Posts:




New Member



    Hi!

    I'm evaluating the trial-version of LuceneSearch and found a slight problem with localized characters, in our case the swedish characters åäö.

    If i perform a search with the keyword "projekt" i get all relevant matches, but if i instead search for "underhåll" i get no matches at all even though there are both pages and html-content with the word. Is there any setting i need to set for this to work or is this a bug?



    Jesse











    New Member



    Posts:




    New Member



      Hi,

      There are several reasons that might be the cause of your problem.

      First, if you enable localized analyzers (The Swedish snowball analyzer is available together with the corresponding stop words in the app_data folder, which you probably figured out), you have to make sure that the same analyzer is used at indexing time and at querying time.

      There are parameters both in the indexing settings and in the result settings controlling those 2 aspects independently.

      You can also browse your index with Luke (http://code.google.com/p/luke/) to figure out what's in your Lucene index, thus what processing was applied at indexing time.

      A localized analyzer will usually account for the language specificities to trim word prefixes and suffixes and index/store only the sanitized root part for each term. This allows to match plurals, conjugated verbs, and unaccented chars for broader and better results, but it will only work if the same processing is applied to the search query.

      Then there is currently also an issue with the auto-complete feature, which uses a prefix query (equivalent to using the * wild-card to match various possible endings)

      The way prefix or wild-card are processed by Lucene hijacks the use of a tokenizing analyzer and Lucene looks for exact prefix matches to be completed with a specific ending.

      We are currently thinking about workarounds for that last issue, where we would couple the current prefix query together with a regular search query properly analyzed. But again this is relevant only for the skin object if you enabled the auto-complete feature.

      A regular "non prefixed" search should only be concerned with properly matching the querying analyzer with the indexing analyzer.

      Regards

      You are not authorized to post a reply.