UC Indexing
Index/re-index is the process of sending various entity metadata and ACLs to the search back-end (Apache Solr or OpenSearch). This is to make sure that various search features work correctly.
Below is a list of endpoints that depends on search.
GET /storage/file
GET /storage/{id}/file
PUT /item
GET /item
GET /item/saved/{hash}
GET /search/saved/{hash}
GET /search
PUT /search
PUT /search/autocomplete
GET /search/shape
PUT /search/shape
GET /search/file
PUT /search/file
PUT /document/search
PUT /API/metadata-field/field-group
Normally, indexing/re-indexing happens automatically when there is any update to relevant entities. For example: item/collection metadata update, ACL updates, file size changes, etc.
They can also be triggered manually using:
PUT /item/{id}/re-index
PUT /reindex/{type}
How does the indexing/reindexing process look like internally?
They are two index processing routes internally:
Route A:
Some API request, storage scanning, or job processing threads make changes to certain entities.
The same process handling the request/change will build an new Solr/ES document, and send it to the
IndexQueue
in ActiveMQ.IndexCruncher
orOpenSearchIndexCruncher
, depending on whether Solr or OpenSearch is used, picks up the messages fromIndexQueue
, and send them to the search bac-kend. After this, the search result should reflect the latest changes.
Route B, which is the most common route:
Some API request, storage scanning, or job processing threads make changes to certain entities.
The same process handling the request/change will mark revelent entries in
t_indexlog
table as pending for re-index.ReindexCruncher
picks up the changes int_indexlog
almost immediately, builds index documents of related entities, and send them to theIndexQueu
in ActiveMQ.IndexCruncher
orOpenSearchIndexCruncher
, depending on whether Solr or OpenSearch is used, picks up the messages fromIndexQueue
, and send them to the search back-end. After this, the search result should reflect the latest changes.
When to perform re-index?
As mentioned above, re-indexing of an entity happens automatically, and the result should be searchable in a short while. Typically, a manually re-index is only needed when:
The search index in not available anymore. For example: the Solr/OpenSearch index is not preserved after a system migration.
Some fatal error happened to ActiveMQ, and the unprocessed messages are lost.
You want to be extra sure that an entity is re-indexed.
How to perform re-index
Perform the following request in sequence to re-index the whole system.
PUT /reindex/acl
PUT /reindex/item
PUT /reindex/collection
PUT /reindex/file
PUT /reindex/document
To only re-index entities that have been changed within a certain time range, for example in the case of ActiveMQ failure mentioned above, use this SQL query like :
update t_indexlog set c_status =1 where c_processed >'2022-01-18';
Don't forget to change the time in the example.
To only re-index an item:
PUT /item/{id}/re-index