Distributed Caching

GigaSpaces Enterprise Edition Distributed Caching

{color:orange}Adaptable Caching for Evolving System Requirements{color}

Different applications may have different requirements from a caching solution based on the specific scenario in which the cache is used. For example, one application may have certain limits in its memory size and therefore would not be able to store all the data in memory. In this case, it will require the ability to load on demand the data that is required based on the capacity available in the application's memory. This will be used in conjunction with a remote cache configuration to provide a virtual remote memory storage.

A second application may need the cache for read-mostly purposes - in such a case, the read performance would need to be as close as possible to the performance of getting an object from the local memory. In such scenarios, concurrent update locking may not be required and the write performance would be less important.

Transactional applications need a cache facility that can handle both write and read operations and maintain consistency across all of the distributed components running across the network.

In order to address all these different requirements, GigaSpaces was built in a policy-driven manner. Most of the policies do not affect the actual application code, but rather affect the way each space instance interacts with the other space instances.

{color:orange}Scenarios for Using the Cache Service{color}

Reducing the I/O Overhead When Accessing Distributed Information

This is the most common scenario for using a cache facility. Accessing a remote process or a machine hard drive are expensive operations. In this scenario, the application can improve access to a remote data source by transferring it closer to the application. The cache facility enables the mechanism to store, synchronize, and manage the remote data-source information in a local memory. Thus, it shortens the time necessary to access the remote data-source.

Reliable Memory Storage

One of the problems in storing information in memory is the lack of reliability and sharing capabilities of the memory resources. For example, there is a need to maintain the application state, while keeping it highly available. If one server fails, another one can recover from the point of deficiency, without losing the first server's state.

Caching adds reliability and sharing capabilities to the memory resources. Therefore, the application can even maintain critical information in the memory. This can occur without compromising either performance or reliability.

*Distributed Session Sharing *

Session information represents intermediate information. This information normally contains counters, users' billing information, profiling information, and, in many cases, HTTP session. Such information is critical to the application during the session, or even during a specific operation; yet it is completely useless afterwards. At these periods of time, session information must be accessed at a high speed.

In distributed applications, the session information must be shared by multiple applications. A true caching system provides high performance data storing, which is optimized for session information. It utilizes memory resources, which in turn offer high-performance access to session information.

*Caching of User Profiles and Personalized Information in a Portal Application *

In a portal application, personalization usually requires the compilation of a user's profile preferences with specific page content. Since HTML pages in a portal can be built from multiple portlets and pages, such a process can be very expensive, since it needs to access multiple remote processes to gather all relevant data. Although in a single process storing the profile information in-memory may be adequate, in a distributed portal environment, the user profile must be maintained consistently across all portal instances. Within such an environment, the cache facility enables access to users' profile information at in-memory speed. This occurs while still maintaining the consistency of the information across multiple portal instances.

*Grid Environments *

A Grid environment is a means to utilize distributed CPU resources in order to perform complex computational tasks. One of the bottlenecks in performing such tasks is access to the information associated with these tasks. In many cases, this information resides either in a centralized database or in a file system. As a result, access to this information becomes a bottleneck, which at some level can become a real obstacle to performing tasks in a parallel manner. Examples of such information are the task information, previous calculated results, conversion tables, processing rules, and user profiles. The Data Grid caching system solves this bottleneck by allowing each processing unit to load this information into its memory address space on demand.

{quote}This thread was imported from the previous forum. For your reference, the original is [available here|http://forum.openspaces.org/thread.jspa?threadID=2320]{quote}

asked 2006-02-25 16:31:04 -0500

admin gravatar image

updated 2013-08-08 09:52:00 -0500

jaissefsfex gravatar image
edit retag flag offensive close merge delete