Aricie
Intégrateur de vos solutions d'avenir
DNN Community Forums

Large Number of Documents to Index

Sort:
You are not authorized to post a reply.
Author
Messages


G











New Member



Posts:




New Member



    Indexing large volumes of information is problematic. You simply run out of memory when the number of SearchItemInfoCollection grows to a large number.With the DNN implementation of the SearchDataStoreProvider (stores everything in the relational database), I would simply govern how many SearchItemInfos I returned for each re-index. The DNN SearchDataStoreProvider never deleted anything, it only updated SearchItemInfos if there PubDate was newer.The Aricie SearchDataStoreProvider seems to delete SearchItemInfos each re-indexing. Although this is a more intuitive behavior then the DNN implementation, it simply fails for any strategies involving multiple passes over indexing modules.Can I accomplish a behavior similar to DNN’s implementation using existing Aricie settings? Is it possible to add a feature Aricie to not remove items?


    samyb











    New Member



    Posts:




    New Member



      Hello Gary,

      For indexing LuceneSearch uses the native provider for each ISearchable. So the amount of SearchItemInfo will more or less match what would be returned for the native DNN provider. What happens then is a check of the returned item to see if some data was changed since the latest indexation. DNN uses the PubDate, and you can also use the PubDate in LuceneSearch by checking the setting Indexation Settings > Existing index > Trust publication dates.
      You can also tweak the indexation steps settings under Indexation Settings > Existing Index to optimize how much time and items are used when indexing; it can help depending on your number of documents.

      Best regards,
      Samy

      ETA: i forgot to add that we don't have a way at the time to disable the deletion of obsolete documents from the Lucene index. Disabling this deletion would have side effects such as document duplication in the search results so this option cannot be considered at the time.
      You are not authorized to post a reply.