Welcome to the new Gigaspaces XAP forum. To recover your account, please follow these instructions.

Ask Your Question
0

Partitioning, Space Sizing strategies

Hi,

I have a few queries in connection with the datagrid partitioning and sizing each space within the cluster

1. When partitioning is applied, documentation claims that the object distribution would be based on hashkey. How does this ensure that objects are distributed evenly in multiple partitions? In short, I want to understand if N number of partitions would be equally memory-loaded by this strategy?

2. Is there a way in which the memory usage per partition of cluster could be limited? Main motive behind this question are

a. there must be a manageable way to determine if a particular partition is not growing indeterministically when the performance starts degrading.
        Probably, management applications around Gigaspaces could build alarm mechanisms whenever the monitored memory level of a particular
partition is getting closer to it maximum imposed limit

b. This would also give an operational indication to large HA systems to plan expansion of cluster. Example - whenever we find that a particular
        space partition is using x% percentage of allocated memory, then we can plan for adding additional partitions as required without encountering a
failure situation one fine morning all of a sudden.

c. I assuming that per partition if runs within a single JVM instance wouls start exhibiting perf issues if memory is allowed to increase without any
        limitation with respect to that particular partition (JVM gc could become indecent to create CPU spikes by kicking-in at most unexpected times
thereby causing unpredictable perf characteristics)

Thank you in advance for your patience in reading this relatively long query(ies)

Regards
Muthu

This thread was imported from the previous forum.
For your reference, the original is available here

asked 2008-11-04 22:22:22 -0500

muthukmk gravatar image

updated 2013-08-08 09:52:00 -0500

jaissefsfex gravatar image
edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
0

1. If your routing key field hash key values are not evenly distributed , space partitions will not have even amount of objects. So you need to pick a routing field that will have a value that its hash code will have many different values or at least more different values than the amount of partitions you might have. In many cases we recommend having a numeric integer based field to act as the routing field. This could accommodate a session ID or order ID. If you would like to rout multiple objects into the same partition (to avoid the usage of distributed transactions) , you should have the same routing value used for all transaction objects and use local transaction manager. In such a case this routing field might be a common business logic value such as a user ID or a trade ID.

There are situations where there is no handy routing field to use and in such cases you might end up picking a random value or a unique value (using some sequencer) to be the routing field value. This will make sure the objects will be routed into the different partitions in even manner.

2. U may limit the amount of data that will be stored within the partition based on fixed amount of max objects (not very useful when u have variable objects size) or based on max used memory percentage (out of the Xmx size). See the memory management facility page on the wiki for more details. http://www.gigaspaces.com/wiki/displa...

Please note that with GigaSpaces the policy to expand the capacity of the amount of in-memory storage size done by adding additional containers and moving existing partitions into these newly started containers (i.e. "logical re-hashing" of the partitions locations and not physical re-hashing of the different space objects across multiple partitions). We do not support adding partitions into a running cluster due-to the instability that such activity may cause (re-hashing , client updates , phantom GC activity...). Instead - relevant trigger might be placed within an agent running as part of some monitoring system to start a new container on a relevant available machine and initiate a relocation policy based on some SLA that will expand the cluster capacity in real time.

Shay

Edited by: Shay Hassidim on Nov 5, 2008 12:33 AM

answered 2008-11-05 00:21:49 -0500

shay hassidim gravatar image
edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2008-11-04 22:22:22 -0500

Seen: 64 times

Last updated: Nov 05 '08