Welcome to the new Gigaspaces XAP forum. To recover your account, please follow these instructions.

Ask Your Question
0

Trying to get a PaaS setup working

Some time ago I ran across Nati Sahalom’s blog on PaaS ([ http://natishalom.typepad.com/nati_sh... ]) and was intrigued. I had been thinking about something similar for a while. More recently Mr. Shalom made another post ([ http://natishalom.typepad.com/nati_sh... ]) that went into much more detail. I decided to give this a try. Unfortunately, it’s been a bit more difficult to setup than I anticipated. Along the way I’ve generated some comments and questions.

The fictional application that I’m building for this proof of concept has the following relevant attributes: * It’s a web application. * The data set is extremely large but partitioning is feasible. ** About 20% of the data that users will access will be common to all users with the other 80% being specific to their organization. The application will support a very large number of organizations.

Keeping in mind that I’m extremely green when it comes to datagrid technology, especially GigaSpaces’ variant of it, here’s what I'm assuming is a typical setup for this kind of situation: a standalone master space (I’ll refer to it as the DataSpace) that is partitioned across multiple VMs (likely on multiple machines). Each client (the client being a clustered instance of the web application) would have a local cache ([ http://www.gigaspaces.com/wiki/displa... ]). The DataSpace would be mirrored asynchronously to a database ([ http://www.gigaspaces.com/wiki/displa... ]). Any suggestions for how this setup might be improved are welcome, of course. I was attempting to stay consistent with what Mr. Shalom described in the second blog post I referenced above, however I could have easily misinterpreted soemthing.

As I mentioned above, it’s taken me much longer than I thought it would to get this working especially considering the excellent quality of the GigaSpaces documentation. The problem has been that the docs assume that you are already familiar with all of the concepts and terminology related to this technology, which of course I wasn’t. Finding all of the pieces of the puzzle to fill in the holes of what I didn’t know was a little difficult. It seems like there’s a slight gap in what is otherwise incredibly thorough documentation. There are excellent conceptual overviews, and then there are highly detailed examples, but very little transition between the two, especially with regards to OpenSpaces. It wasn’t too difficult to go from the GigaSpaces overview pages into the low level documentation, but stepping over to OpenSpaces caused me some problems. The again, that could just be me. I seem to be pretty thick-headed when it comes to these things. ;)

I just barely have this configuration up and running now with the most simple of scenarios. I’ll be expanding it a bit with more complex domain objects and then I'll be doing some performance and consistency testing.

Questions:

1) Does all of the data from the database exist in the DataSpace or is it only loaded the first time it’s requested? In the latter case, does data eventually expire or timeout?

2) What happens when the DataSpace is “full”? Does it act like a cache where “least-used” or “first-in-first-out” rules are applied, or does it simply block the addition of new data until room can be made?

3) What controls the size of the DataSpace? Is it purely a matter of the amount of memory allocated to the VM that the GSC resides in?

4) Is there a way to dynamically grow or shrink a the DataSpace? Would there be a reasonable need to do so? I’m using the <os-sla:sla> Spring descriptor to specify the number of instances and backups. This would lead me to believe that I would need to modify that tag to change the number of partitions.

5) Using OpenSpaces configuration in the client (in this case that’s the web application) makes a lot of sense. It’s an easy way to connect my application services to the DataSpace. However is there any real benefit to configuring the standalone DataSpace and Mirror Service using OpenSpaces as opposed to using the native GigaSpaces XML descriptors and strategies?

6) I am using the POJO class and field level annotations ([ http://www.gigaspaces.com/wiki/displa... ]), but I’m not sure I fully understand why. Is this metadata needed in order to store an object in a space? It doesn’t seem like that’s the case because I recall a couple of the tutorials that don’t use this kind of metadata. I believe I may have read that this metadata is present so that the objects described can be manipulated by the JavaSpace and GigaSpace APIs? Is that correct?

Thanks!

Matt Welch

{quote}This thread was imported from the previous forum. For your reference, the original is [available here|http://forum.openspaces.org/thread.jspa?threadID=1901]{quote}

asked 2008-04-25 16:02:45 -0500

mwelch gravatar image

updated 2013-08-08 09:52:00 -0500

jaissefsfex gravatar image
edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
0

Matt,
The product includes various ready made examples and tutorials you should use instead of building your application from scratch. See http://gigaspaces.com/wiki/display/GS6/WelcometoGigaSpacesQuickStartGuide.

When using local cache you should take into consideration this is relevant for cases you have at least 80% of repeatable read requests of the same object. Data will be loaded into the client side cache on-demand. Data will be evicted from the client side cache based on cache size or memory usage. See the local cache defaults for details. If desired object is not found within the client cache it will be loaded from the master space in case it is located over there. The client pushing data into the space should also have its local cache enabled.
If the client can provide the relevant data set filter (SQL based) it will maintain - the local view should be used. The master space would not be accessed in such a case. See the local view for details.

Please note we are constantly improving the wiki doc to allow better flow, with pointers to relevant concepts required, smart labels and sophisticated search we will provide soon.

Specifically answering your questions:
1. When running in ALLINCACHE cache policy mode, all the data should be loaded into the space as part of the ExternalDataSource.initialload phase. This is the preferred mode since it provides you deterministic behavior , the best performance and leverage the space cache query processor in optimum manner.
When running in LRU cache policy mode, data will be loaded from the database on demand. This approach is good when you are not using queries (readMultiple) but single based data read.
2. When running in ALLINCACHE cache policy mode and the memory usage is turned on, the space will block write operations avoiding the space JVM getting into OOME (default starting with 6.5). Read and take operations will allow you to read and remove data from the space.

When running in LRU mode data will be evicted when the amount of total objects within the space (single space) crossed the cache size or when the space crossed the high watermark percentage.

When the ExternalDataSource is used persistent entries would not be evicted when their lease will expire. This limitation will be lifted in future versions.

3. The size of the data space is the amount of memory the JVM heap size got. When running partitioned space the amount of data to be stored within the space is aggregation of all primary space JVM size. You can always shrink the footprint via compacting techniques. Let us know if you need help with this.

4. Clustered space capacity can "increase" its size dynamically by "moving" existing space into a GSC with larger heap size or when hosting multiple partitions within the same JVM and later moving one or more of them into empty an GSC. The above can be done via SLA. Let me know if you need example for such.

5. The Mirror service is a classic component to be run within the GSC. Once it is deployed into the GSC, the GSM will make sure it will be running and started in case its JVM fails. This is part of the self-healing capabilities the product offers for deployed services. See example for a Mirror deployed into the GSC at: http://gigaspaces.com/wiki/display/OLH/MirrorServiceExamples

If you got stand alone application you can use openspaces API without actually deploying the business logic within the GSC. This will provide you on API for all applications layers. Not to mention the ability to use remoting (SVF) that allows you to invoke business logic across the grid in one simple method call.

6. Each space class has a set of meta data required. Very similar to a database table. These would be indexed fields list , routing field , Id field , fifo behavior , replication mode behavior etc. When using classes implementing the Entry interface such meta data should be declared explicitly via relevant classes/interfaces and methods (the Entry interfaces came long time before the annotations and xml based decorations concepts). Once GigaSpaces supported the POJO concept (or more exactly JavaBeans as Hibernate and other frameworks supports) , we provided the ability to declare space decorations using the standard modes: annotations and external xml config (aka gs.xml). You can in-fact generate gs.xml from hmb.xml files or generate the POJO class from gs.xml.

Shay

answered 2008-04-25 20:52:36 -0500

shay hassidim gravatar image
edit flag offensive delete link more

Comments

Thank you for such a detailed reply. A great deal of what you said is a bit over my head, but I'll continue to come back to your response as I work on this project.

mwelch gravatar imagemwelch ( 2008-04-26 22:53:39 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2008-04-25 16:02:45 -0500

Seen: 128 times

Last updated: Apr 25 '08