Welcome to the new Gigaspaces XAP forum. To recover your account, please follow these instructions.

Ask Your Question
0

Recommended approach to space recovery for applications

We are using XAP 6.5 GA in a pure Java env. We currently deploy a primary and a backup space. My question is what is the recommended approach to application recovery when all primary and backup Spaces go down for even an instant?

Currently, in each application, as part of our own space framework, there exists a background thread which periodically checks the space using the IJSpace.ping() method. If this fails then we sleep for some configured amount of time and then attempt to re-initialize the IJSpace again using the SpaceFinder.find() method until we are successful. This seems to work for use but is impossible to know for which cases this will not work until it happens.

If we do nothing then the Space proxy becomes unusable and never recovers on its own.

Is this an acceptable practice? I remember reading some time ago in this forum that you should only call SpaceFinder.find() once. This is a rare occurrence, the failure of all of our space nodes, but it does happen on occasion and I would like an elegant way to recover other than restarting all of our applications individually.

TIA

Mike

This thread was imported from the previous forum.
For your reference, the original is available here

asked 2008-12-26 10:53:56 -0500

mcnoche gravatar image

updated 2013-08-08 09:52:00 -0500

jaissefsfex gravatar image
edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

Mike,

When a there is a partial cluster space failure (primary / backup fails) the GSM will provision it and re-deploy it to another existing GSC. I'm assuming you are deploying your application into GSC and have the proper SLA in place.

In your application , in case of a partial failure , you should not call ping in periodic manner and call the SpaceFinder.find() to recover the proxy. The proxy does it implicitly. If your proxy becomes un-usable this means something is not configured correctly. I suggest you also to move into latest 6.6 release. This might fix the problem you are experiencing.

If the intention here is to recover the client connectivity with the space after a complete shutdown of the space cluster - your solution would work. Still , here is another way to resolve this: consider wrapping the business logic with relevant exception catch and call the SpaceFinder.find() (with sleep some retry count) once you have a remote exception or other related exception thrown. This how you will call the SpaceFinder.find() only when necessary (when there was a complete shutdown of the spaces/lookup). This will return a refreshed proxy.

Here is a simple JMX based agent you can use to monitor a clustered space - it is using similar technique as described above:
http://www.gigaspaces.com/wiki/display/SBP/JMXSpaceStatistics

There is GigaSpaces integration for Sun Grid Engine , CA wily, Hyperic HQ and others. You might want to use these for monitoring.

Please note GigaSpaces 7.0 planned for Q2 09 will provide new admin API that will allow you to monitor all GigaSpaces components and their life cycle in very simple manner. These includes the PU, OS , Space , JVM , GSC , GSM , LUS , communication , etc. A milestone release already available.

Shay

answered 2008-12-26 14:00:27 -0500

shay hassidim gravatar image
edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2008-12-26 10:53:56 -0500

Seen: 58 times

Last updated: Dec 26 '08