In the fourth and final part of the High Availability series we take a look on how you can use SolrCloud together with Vidispine for high availability, keeping your media assets searchable all the time.
In this fourth and final part of the high availability series we will show you how to use SolrCloud together with Vidispine. In Part #1 we made a simple Vidispine cluster, in Part #2 we added HAProxy, and in Part #3 we added pgpool-II for a high availability database.
How does SolrCloud work
In SolrCloud, a logical index is called a collection. A collection is split into a number of shards. Each shard contains a number of instances. One of them would be the leader of the shard, the others would be replicas. If a leader fails, one of the replicas would be elected as the new leader. And SolrCloud uses Zookeeper to maintain the distributed Solr instances.
Each time a Solr instance receives a document, it forwards the document to its leader, who will then calculate the hash of the document id. Based on this hash, the document is then forwarded to the leader of the destination shard, who will them distribute this document to its replicas. As a result, a logical index is split evenly into different shards. And all the instances in a shard should contain the same index.
Hence, it is very important that there is no complete failure of a single shard, e.g. a shard that is not replicated, as the SolrCloud would not function, and there is no guarantee of index consistency.
Installation and configuration
Download Zookeeper 3.4.6 from https://apache.mirrors.spacedump.net/zookeeper/zookeeper-3.4.6/ and unzip it.
cd zookeeper-3.4.6 mkdir data cp conf/zoo_sample.cfg conf/zoo.cfg
edit dataDir to the path of the created data folder.
Your Zookeeper should be running at localhost:2181 ;
Download Solr 4.10.4 from https://archive.apache.org/dist/lucene/solr/4.10.4/ and unzip it;
cd solr-4.10.4 cp example solrInstance-1 cd solrInstance-1
replace schema.xml and solrconfig.xml under /solr/collection1/conf/ with Vidispine schema and config.
Edit /etc/hosts to make the host name pointing to the machine IP address instead of loopback IP, so:
repeat above on your other solr servers.
# Our Zookeeper is running at 10.185.20.100:2181 # On Instance 0: java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=VidiConfigTest -DzkHost=10.185.20.100:2181 -DnumShards=2 -jar start.jar # On Instance 1: java -DzkHost=10.185.20.100:2181 -jar start.jar # On Instance 2: java -DzkHost=10.185.20.100:2181 -jar start.jar # On Instance 3: java -DzkHost=10.185.20.100:2181 -jar start.jar
Please note that you only need to specify -DnumShards and -bootstrap_confdir when you start the first instance. And you may change the number of shards according to your need.
You could simply use “screen” to start solr in the background and redirect the log to file:
screen -S solr -d -m /bin/bash -c 'java -DzkHost=10.185.20.100:2181 -jar start.jar > solr.log'
or follow this wiki page to setup solr logging: https://wiki.apache.org/solr/SolrLogging.
There is a nice admin page on your solr instances: https://localhost:8983/solr
For more info about SolrCloud, please refer to https://cwiki.apache.org/confluence/display/solr/SolrCloud
SolrCloud test with docker
You can also test SolrCloud using this docker-compose env which sets up a ZooKeeper and SolrCloud instance, with numShards=2.
Download https://transfer.vidispine.com/d57/b84ad995cc314552c44775fdcaae9/solrcloud-docker.tar.gz and then run the following:
$ # install docker and docker-compose, then: $ tar -xvzf solrcloud-docker.tar.gz $ cd solrcloud-docker/ $ docker-compose up $ docker ps # to get the ports
What’s the API call that you use to split the shards? GET solr/admin/collections?action=SPLITSHARD?
This conclude our posts on Vidispine High Availability configuration. You can find the other posts here: