Failover time is too long

Hi, I hope that someone here knows the answer for this simple question:

we have partioned cluster with synchronous replications. when primary node fails it takes about 5 seconds for backup to detect the failure of the primary and to start working. The question is: is this normal? We really need shorter time - how we can change it?

Thanks, Kate

asked 2007-05-01

updated 2013-08-08

Do you see the 5 seconds delay from the client side?
Are you using transactions?
In general fail-over should take very short time.
Do you have IWorkers running at the backup space that takes time to wake up once their host space becomes active?


answered 2007-05-01

It is taesting application, so I from my debug messages I can see that only after 5 seconds the backup's IWorker initializes. Before that there is log message that backup do not get heartbeat from primary during 4500 ms. I thought that backup waits for 4.5 sec and only after that it detects that primary failed, is it right? We use only local transactions.

taki ( 2007-05-01 )

Please lower the <fail-over-find-timeout> as part of the cluster schema (located GigaSpaces Root\config\schemas\sync_replicated-cluster-schema.xsl and retry.


shay hassidim ( 2007-05-01 )

