Index/re-index is the process of sending various entity metadata and ACLs to the search back-end (Apache Solr or Elasticsearch). This is to make sure that various search features work correctly.

Below is a list of endpoints that depends on search.

  • GET /storage/file

  • GET /storage/{id}/file

  • PUT /item

  • GET /item

  • GET /item/saved/{hash}

  • GET /search/saved/{hash}

  • GET /search

  • PUT /search

  • PUT /search/autocomplete

  • GET /search/shape

  • PUT /search/shape

  • GET /search/file

  • PUT /search/file

  • PUT /document/search

  • PUT /API/metadata-field/field-group

Normally, indexing/re-indexing happens automatically when there is any update to relevant entities. For example: item/collection metadata update, ACL updates, file size changes, etc.

They can also be triggered manually using:

  • PUT /item/{id}/re-index

  • PUT /reindex/{type}

How does the indexing/reindexing process look like internally?

They are two index processing routes internally:

Route A:

  1. Some API request, storage scanning, or job processing threads make changes to certain entities.

  2. The same process handling the request/change will build an new Solr/ES document, and send it to the IndexQueue in ActiveMQ.

  3. IndexCruncher or ElasticSearchIndexCruncher, depending on whether Solr or Elasticsearch is used, picks up the messages from IndexQueue, and send them to the search bac-kend. After this, the search result should reflect the latest changes.

Route B, which is the most common route:

  1. Some API request, storage scanning, or job processing threads make changes to certain entities.

  2. The same process handling the request/change will mark revelent entries in t_indexlog table as pending for re-index.

  3. ReindexCruncher picks up the changes in t_indexlog almost immediately, builds index documents of related entities, and send them to the IndexQueuin ActiveMQ.

  4. IndexCruncher or ElasticSearchIndexCruncher, depending on whether Solr or Elasticsearch is used, picks up the messages from IndexQueue, and send them to the search back-end. After this, the search result should reflect the latest changes.

When to perform re-index?

As mentioned above, re-indexing of an entity happens automatically, and the result should be searchable in a short while. Typically, a manually re-index is only needed when:

  • The search index in not available anymore. For example: the Solr/Elasticsearch index is not preserved after a system migration.

  • Some fatal error happened to ActiveMQ, and the unprocessed messages are lost.

  • You want to be extra sure that an entity is re-indexed.

How to perform re-index

Perform the following request in sequence to re-index the whole system.

  • PUT /reindex/acl

  • PUT /reindex/item

  • PUT /reindex/collection

  • PUT /reindex/file

  • PUT /reindex/document

To only re-index entities that have been changed within a certain time range, for example in the case of ActiveMQ failure mentioned above, use this SQL query like :

update t_indexlog set c_status =1  where c_processed >'2022-01-18';
CODE

Don't forget to change the time in the example.

To only re-index an item:

PUT /item/{id}/re-index
CODE