Welcome to the new Gigaspaces XAP forum. To recover your account, please follow these instructions.

Ask Your Question

How (or can you) join two non-embedded objects using SQLQuery?

I must be missing something so this is probably a dumb question :)

If I have two objects in memory that have a relationship (e.g. Book/Author where a Book has an Author), and I have all the Books and Authors in Gigaspaces, how do I do a join to get all the Books that have an Author with a last name of Smith?

For the sake of simplicity, let's assume that there is a one-to-one relationship between the two, even though normally there wouldn't be.

i.e. select b.* from Book b, Author a where b.authorId = a.id and a.lastName = 'Smith'

I read all the documentation on 'Embedded' versus 'non Embedded' but I can't seem to find an answer. It would be easy if Author was embedded in Book but, assuming it isn't, I don't see any examples of joins between non-embedded objects. How would I do the above join using SQLQuery so that I get only the Books that I want?

Is this possible?

{quote}This thread was imported from the previous forum. For your reference, the original is [available here|http://forum.openspaces.org/thread.jspa?threadID=4024]{quote}

asked 2013-01-24 15:45:02 -0500

rickfish gravatar image

updated 2013-08-08 09:52:00 -0500

jaissefsfex gravatar image
edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted


The JDBC driver supports selection from multiple tables. The JDBC Join uses the Cartesian product of the tables data to form the result set. The JDBC Join will perform well when having tables with small/medium size (up to 1,000,000 rows).

Using embedded object model approach with the SQLQuery will perform much faster than JDBC as it will avoid the extra call involved with the JDBC Join activity.
See examples for embedded objects query:

Note you can use JDBC with POJO objects created via the GigaSpace interface - This can be very powerful combination - see nice example using also GigaSpaces DistributedTask API:


answered 2013-01-24 16:02:57 -0500

shay hassidim gravatar image
edit flag offensive delete link more


Thanks Shay. Yes, I completely understand the embedded approach. We are just figuring things out now, so maybe that is where we end up. But we are experimenting to see how our object model should be laid out. In the scenario we are trying, we have common objects that could be in more than one collection if we use the embedded approach and that would eat up a lot more memory than we want.

It seems like you're saying that we need to use the JDBC/Gigaspaces API hybrid approach just to do a join between two non-embedded objects. Is that correct or am I misinterpreting what you're saying?

rickfish gravatar imagerickfish ( 2013-01-25 07:55:06 -0500 )edit


It is true you need to model your data to match the type of queries you will be executing to optimize the query execution. Using SQLQuery with embedded objects will fit most scenarios. JDBC will fit most classical queries that involve order by and group by. Hybrid (SQLQuery + JDBC) with Distributed Task will support more advanced scenarios. Remember GigaSpaces data access is all amount in-memory map-reduce approach that let you access distributed data in parallel. Still, in some cases , especially when you query vast amount of data it is difficult to model the raw data to support fast query response time since there is a need to scan large tables or compare / sort large amount of index values.

GigaSpaces provide an elegant approach to deal with this problem : Counters - you can have half/fully baked meta data you can use to execute queries that span vast amount of data. These [Counters|http://wiki.gigaspaces.com/wiki/display/XAP91/Counters] can be updated once data is loaded in an explicit manner or implicitly via the [notify container|http://wiki.gigaspaces.com/wiki/display/XAP91/Notify+Container].

Instead of query the raw data and forcing your data model into something does would not fit the application flow you can access the counters data directly and present it to the user.

For example for a shipping company that would like to present a status of all the packages (WaitingForPickup, DepartureScan, InTransit , Delivered , etc) they have across the entire country grouped by cities / region / state for a certain scheduled delivery date (next 24 hours , next 7 days , etc) , generating such a report might involve multiple queries that will need to span the entire data set packages database, sort and group by vast amount of raw data. Instead , counters can be maintained , in a time series approach per city or region and have these being queried instead of the raw data. This will speed up the query execution dramatically. Such Big Data Real Time Analytics approach can be applied for flights , train cars , truck fleet , buses , Space debris (space junk) , etc...

Since I don't know your exact scenario I can't recommend a specific approach. You can contact me via shay at gigaspaces.com if you need additional help.

Regards, Shay

shay hassidim gravatar imageshay hassidim ( 2013-01-25 09:22:26 -0500 )edit


The [Modeling your data page|http://wiki.gigaspaces.com/wiki/display/XAP91/Modeling+your+data] has been updated to include info how the Embedded and non-Embedded model should be implemented.

The JDBC Driver can be used to perform such a join query , but it won't perform very well once there are many Author and Book objects.


shay hassidim gravatar imageshay hassidim ( 2013-01-30 13:23:21 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower


Asked: 2013-01-24 15:45:02 -0500

Seen: 716 times

Last updated: Jan 24 '13