Welcome to the new Gigaspaces XAP forum. To recover your account, please follow these instructions.

Ask Your Question
0

Generate ID (int32, .net)

Any general guidelines for how to write an ID (32-bit integer) number generator for space objects? (Seems, no such thing in the box). It should be thread-safe on the client and work for multiple space clients.

I saw the pet clinic project's java implementation addressing the same issue, any .net reference?

{quote}This thread was imported from the previous forum. For your reference, the original is [available here|http://forum.openspaces.org/thread.jspa?threadID=3112]{quote}

asked 2009-05-28 22:35:01 -0500

theo gravatar image

updated 2013-08-08 09:52:00 -0500

jaissefsfex gravatar image
edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

There's no out of the box implementation for this, however you can implement it quite easily with a PU that acts as an ID generator service as long as you don't mind that each time you create an ID you need to access a remote service in the space and not something that is in within the client.

Here's my Idea:
You can have a PU which is deployed as a single instance in to the cluster, that PU will have a remote clustered proxy view on the entire cluster and will have a polling container polling for a GenerateID request object. Once it receives such an object it will return an ID result object. The client code that initiates a request for ID writes a GenerateID request object to the cluster and wait for ID result object to be written back and he takes it.

The ID generator pu needs to keep a single state object inside the space that holds the last generated ID and each time it updates its value inside the cluster once it generated an ID, this ensures that if the PU machine is down and it is restarted in a different GSC it will keep having the same state.

You can have it run multithreaded if the load is busy, transactional and so on. This is easily configurable with the polling container, simply a manner of attributes.
All the details about the polling container can be found at: http://www.gigaspaces.com/wiki/display/XAP7NET/PollingContainerComponent
I recommand using 7.0 because we enhanced the event listener container a bit with respect to transactions. 7.0 rc1 is already out.

Please pay attention to the Non-Blocking receive handler part in the docs if you are taking the clustered approach for the service.

Eitan

answered 2009-05-29 03:18:27 -0500

eitany gravatar image
edit flag offensive delete link more

Comments

Hello Eitan,

Hmm....a lot of new concepts to digest. Have you considered to provide a reference implementation for this? Every relational database can do it. Besides, ID-based objects are pretty much a fundamental concept in almost any kind of application semantics so it seems like anyone using the space for storage would encounter this problem in one form or another.

Also, I would like to implement it without using a processing unit (a la trusted client-style).

Besides I think you can batch-allocate (i.e. batch-reserve) ID numbers on a client to avoid repetitive queries for allocating a large number of IDs. The centralized mechanism could be able to give up range blocks as well as individual numbers. It's really wasting queries to ask the PU for every single ID if you want to batch-insert.

Ultimately, further to just allocating, there should be some type of de-allocation mechanism (which raises a lot of other questions such as fragmentation and compacting in a distributed environment).

Edited by: theo w on Jun 1, 2009 2:16 PM

Edited by: theo w on Jun 1, 2009 2:19 PM

theo gravatar imagetheo ( 2009-06-01 01:14:25 -0500 )edit

Regardless of your question, are you aware to the AutoGenerate=true proprety you can se on SpaceID field? It is not Int32 based but string, is that not good enough in your case?

You can have the same logic I've described implemented in a client code running in an executable and not a pu, it is exactly the same, have the client holding a clustered proxy and executing the same logic with a polling container inside of it. As I described above, the PU is simply an integration point with the service grid, you can have pretty much the same things going in separate clients that are working remotely on the cluster but keep in mind that if you deploy a grid and a remote client separately, you will not be able to have the client collocated with the space it is working on and it will be remote calls.

You can extend the ID generating logic I described above to fit your requests easily, you can pass how many ids you want to be generated on the IDRequest object and return array of IDs in the result object. You can create releaseID\releaseIDs command and manage this logic in the ID generator client, for instance, you can have a heap based generator that it is initialized with objects representing keys 0 to 10000. Every time id is needed it is taken from the heap, and if Id is returned, it is returned to the heap. if the heap is empty it remembers that the last key in the heap was 10000 and it inserts 10001 to 20000 keys to the heap and repeat the process.

I might be able to write some reference example later on. However, do you understand what I am suggesting here so you can get start working on it? assuming you already know about AutoGenerate=true and it does not fit your case. If not, let me know and we can discuss this offline.

Eitan

eitany gravatar imageeitany ( 2009-06-01 01:40:02 -0500 )edit

Theo,

Comparing GigaSpaces to a traditional relational database is misleading, in this case: Implementing a unique sequential identity in a non-partitioned environment (traditional databases) is easy, but on a partitioned environment (GigaSpaces) it's tricky, error-prone, and most likely will create a performance bottleneck, since it requires some sort of central synchronization across the cluster.

DBAs usually tune databases by partitioning the tables, which means the identity is unique in the partition but not across the entire database. To solve this, they usually add an additional partition id, and use the combination of identity+partition id as a cluster unique identifier.

Our out-of-the-box solution is the [SpaceID] attribute, which uses the property to generate a unique id across the cluster. If you want an automatic identity, you can use [SpaceID(AutoGenerate=true)], and we'll generate a unique cluster identity for you (this is what Eitan suggested). This have a couple of limitations: - The property has to be of type String (I know you've asked for an int, but many times developers ask for an int because they're used to databases, and actually have no problem with String ids). - In a partitioned cluster, you have to declare a different property as [SpaceRouting], since [SpaceID(AutoGenerate=true)] cannot be used for routing (the value is generated at the server). If you use [SpaceID(AutoGenerate=false)] or [SpaceID] there's no such limitation.

If you insist on an autogenerated int identity, you need to implement something yourself (such as Eitan suggested with the PU), since this is not a main path in the product.

Niv

niv gravatar imageniv ( 2009-06-02 05:43:39 -0500 )edit

The reason I bother about this is that any inefficiency in the scheme will be a price to pay everywhere (memory, disk, network, processing time, etc).

I have seen that there is AutoGenerate=true and that every object gets its own ID, like "a1366531-5805-4f2e-8b40-e6b9b36f78fb". This would meet the requirements of a Unique, auto-generated identity perfectly well, but it is not as space-efficient as a 32-bit ID and hence inevitably slower to process. It will certainly impact the performance of the overall application (in our case) since it will waste resources both in memory, CPU, network and disk. We need lots of navigation across space objects and the ability to store the same ID multiple times in other objects is essential. The IDs would have to act as pointers (references) to other objects and it should be possible to persist and re-persist them. It appears inevitable that there has to be an authority to assign such ID numbers.

Assuming this is a HEX identity we have 8 + 4 + 4 + 4 + 12 HEX digits, hence 32 HEX positions, thus 128 bits. (Is it correct?) A 32 bit ID takes only 25% of the storage space of this ID, so it is considerably more efficient. The first computer science class will teach to use the smallest possible data size required. Every database system I can think of will provide the same advice: use minimum column width. Otherwise, why bother with different data types? Is there any particular reason why 128 bit is chosen and not 32 bit?/64 bit? Is it due to routing or clustering?

I saw this request coming up before in another thread: /[/question/5990/best-practice-for-unique-id-generation/]&#8156 The answers, same as here, "it depends", "your requirements". The prevailing wisdom seems to be that ID number generation is something which is completely different from user to user and auto-generated IDs are old fashioned and "database centered". But the fact is, it is possible to draw much stronger conclusions on this. Unless there are specific reasons, every database table will tend to be defined with an ID column which is a 32-bit value and act as a primary key. There are a lot of reasons for this which are highly factual and can be justified in great detail. Examples.., it fits well with the CPU register size and hence extremely fast, it is a value type and goes on the stack, which is fast (no garbage collection), it creates a very wide array space and has high cardinality for indexing e.g. b-trees. Whether the IDs are sequential or not is not the issue, they simply have to be unique and efficient.

Oracle databases also have an internal "ROWID" which is system-generated. It resembles the automatically generated ID in the space. But it is just a pseudocolumn and it is not adviced to be used as a primary key: http://download.oracle.com/docs/cd/B2...

In the pet store sample, the reference documentation indicates that an ID generator was necessary to ...(more)

theo gravatar imagetheo ( 2009-06-02 23:21:42 -0500 )edit

The comparison between databases is not technical but functional (we all know they are different). If the space is to work as an abstraction of a data store (e.g. IMDG), it is fair to compare it with other data stores, and any database can auto-generate ID numbers as int32 easily out of the box. The statement "it's tricky, error-prone, and most likely will create a performance bottleneck" sounds rather subjective to me (how can I verify it?). After all, the cluster is supposed to be the point which is easy, error-free and high-performance (these are also subjective terms).

Although many developers ask for an int and get a string, they should be aware of the efficiency loss. Sorry if my post comes across as insisting. I am just pointing out that the auto-identity generation scheme which is built into the product is not really optimal in data efficiency terms (which has nothing to do with anyone's needs or requirements, the deficiency is intrinsic to GS/JS).

Anyway, thanks for all the advice. I'll search around for other references of generating int32 numbers and how to apply in a clustered space if possible.

theo gravatar imagetheo ( 2009-06-02 23:35:47 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2009-05-28 22:35:01 -0500

Seen: 296 times

Last updated: May 29 '09