Welcome to the new Gigaspaces XAP forum. To recover your account, please follow these instructions.

Ask Your Question
0

Efficient way to keep two spaces in sync

What is the most efficient way to keep 2 spaces synchronized.

Entities that are kept in sync behave very differently in those two spaces: * class implementations are not the same * indexing is not the same

Let's assume objects with 5-50 fields where usually no more that 5 fields change with every update.

I'm looking answers to questions: * use pushing or pulling? * can I use partial update together with pulling (i.e using notify-container)

{quote}This thread was imported from the previous forum. For your reference, the original is [available here|http://forum.openspaces.org/thread.jspa?threadID=2749]{quote}

asked 2008-12-04 06:19:31 -0500

kaarelk gravatar image

updated 2013-08-08 09:52:00 -0500

jaissefsfex gravatar image
edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
0

Kaarel,

Does these spaces located in 2 different sites?
Do you need to synchronize different space clusters located in different remote sites (over the WAN)?

You might need to use combination of polling containers with each space with some payload that will be sent to the other site.

I'm not sure how partial update will help here since it based on the UID of the object. U will need to have identical UID at both ends to be able to use this option.

Shay

answered 2008-12-04 08:33:19 -0500

shay hassidim gravatar image
edit flag offensive delete link more

Comments

My current question is aimed for LAN. But as data volumes will be large we need to plan ahead a little.

As you brought this up. If we need to to WAN synchronization you propose to compose (& marshal) our payload with custom code (to optimize bandwidth)?

kaarelk gravatar imagekaarelk ( 2008-12-05 01:59:05 -0500 )edit

Here is the basic approach you might want to consider: A "push" mechanism will be incorporated to propagate changes from one data center to the other. For each changed (i.e. write, update or delete) object a “marker record” will be inserted into the space indicating a change that needs to be propagated to the other data center.

The marker records will be keyed by object id so that if the same object is changed multiple times, the corresponding marker record will be updated multiple times. The intent of this approach is to ensure that only the latest state of every changed object is propagated across to the other site. If an object is changed multiple times between “pushes”, only one change (the latest) will be pushed to the other data center.

Periodically, a worker embedded in each space nodes will consume a batch of marker objects and write these into the other data center as “execution objects”. These will be consumed by a relevant worker that will execute the operation locally. This approach will make sure the latency between the sites will have minor impact on the synchronization delay (in case over the sites are over the WAN).

Since you have different class implementations and different class indexing at both sides (and probably different fields) you might want to place the actual changes/deltas within a generic payload collection object (hashmap) that will be used to capture the relevant fields and their data. The payload will be part of the execution object. At the target side you will retrieve the relevant object and change its data via reflection or other technique (byte code / introspection…).

Our general approach about clusters over the WAN described here: http://www.gigaspaces.com/wiki/displa...

Each option described above is targeted for different scenario and use case.

Note the content router approach mentioned. This will be provided soon as a solution by one of our partners. Let me know if you are interested with this powerful option.

Shay

shay hassidim gravatar imageshay hassidim ( 2008-12-05 16:17:04 -0500 )edit

Kaarel,

Lets not forget we have also the Mirror Gateway approach that might work for you as well: The Mirror will be implemented to push changes from one data center to the other. The Mirror is getting local cluster primary space operations within a collection (that includes both operation and relevant object elements). This collection will be scanned to convert the original objects to different format (you can't send the objects as is since at the other side you got different versions of these loaded into the classpath). The converted collection will be sent to the destination site within an Execution object (via one way aync replication channel or regular space operation).

At the other site a worker will consume the Execution objects and execute the operations with their corresponding changes/deltas objects against the local cluster.

Shay

Edited by: Shay Hassidim on Dec 7, 2008 6:19 AM

shay hassidim gravatar imageshay hassidim ( 2008-12-07 06:19:10 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2008-12-04 06:19:31 -0500

Seen: 17 times

Last updated: Dec 04 '08