Welcome to the new Gigaspaces XAP forum. To recover your account, please follow these instructions.

Ask Your Question
0

Is there an equivalent to T-SQL's "top n" in Gigaspaces SQLQuery?

Our product deals with large numbers of numeric-valued, dated observations. They are conceptually ordered by date, but they can come into the system out-of-order. Each observation has a "trend" value, which is a measure of how different its numerical value is from the numerical value of the previous observation. Obviously, if a new observation comes in, we need to calculate a trend for this new observation AND for the one that immediately follows it in time.

Consider a new observation Ox. We need an efficient way of calculating these trends. One way to do this would be to use two "top n" queries. One would get the observation Ow just before Ox (so I can calculate the trend for Ox itself). The other would get the observation Oy just after Ox (so I can REcalculate its trend).

Of course, either or both of Ow and Oy may not exist.

We could do this by reading ALL observations, sorting them, and finding Ow and Oy by walking the list -- but there could be hundreds of thousands of observations, I'd much prefer that Gigaspaces found Ow and Oy for me.

asked 2013-08-13 11:36:27 -0500

David S gravatar image
edit retag flag offensive close merge delete

5 Answers

Sort by ยป oldest newest most voted
0

how about using polling container to sequence your data having your own logic. you can put new objects if it is accessed frequently. http://wiki.gigaspaces.com/wiki/displ...

answered 2013-08-13 12:33:44 -0500

venkatg gravatar image
edit flag offensive delete link more
0

David,

I believe the approach described below might be relevant: http://wiki.gigaspaces.com/wiki/display/XAP96/Aggregators

The other option would be to maintain some counters to would avoid data query. See: http://wiki.gigaspaces.com/wiki/display/XAP96/Counters

Shay

answered 2013-08-13 13:32:33 -0500

shay hassidim gravatar image
edit flag offensive delete link more
0

Thank you for your quick replies, Venkat and Shay.

@Venkat: I don't think a polling container buys us anything. We see each observation as it comes in, so we already have a point in the code where we could impose our own logic on some in-memory structure we'd build. This, however would incur a large client-side memory hit. We'd prefer to leave the Observations in the space, and to pull into client space just the ones needed to do the trending calculations. The actual total number of Observations in the space can be in the millions, even though they can be categorized in such a way that any given trending operation pertains only to a subset of them.

@shay: The "Aggregator" approach (your first link) would have worked if there were "CeilingTask" and "FloorTask" aggregators to get (respectively) the earliest Observation whose date is greater than a given date and the latest Observation whose date is less than a given date. But these do not exist.

@shay: I tried repeatedly to follow your second link. I kept getting "page not found".

Thanks, -David

answered 2013-08-13 13:47:32 -0500

David S gravatar image
edit flag offensive delete link more

Comments

actually, I might be able to use the JDBC driver (mentioned in Shay's first link) to do what I want, since JDBC supports the "Limit n" syntax. I'm going to try that.

David S gravatar imageDavid S ( 2013-08-13 14:22:48 -0500 )edit
0

OK. But if the JDBC driver will not provide you what you need or will not be fast enough leveraging the above patterns will be the way to go.

If you must sort data across multiple partitions you will need to use a Distributed Task that will get the "top" objects from each partition and "reduce" at the client each intermediate result into one consolidated result. This is what the JDBC driver essentially doing when called from a remote client. Still , a custom code using projections will make it go way faster as we demonstrate with the Aggregators.

answered 2013-08-13 16:15:00 -0500

shay hassidim gravatar image
edit flag offensive delete link more
0

Thanks Shay,

We am not sorting data across partitions. The data being sorted is all in a single local partition. I don't think we need a "reduce' step. Is there still an advantage to writing our own Aggregator? Can you explain what that advantage is when dealing with a single local partition?

Also: we are still using Gigaspaces 9.1. Is Aggregator functionality even available in 9.1? It appears that gigaspace.execute() exists in the 9.1 API. Is that really all we need?

-David

answered 2013-08-14 14:15:26 -0500

David S gravatar image
edit flag offensive delete link more

Comments

yes - gigaspace.execute() is all what you need. It will avoid the need to transport data between a client and the space.

What about implementing counters to have fully baked data and avoid data query altogether?

shay hassidim gravatar imageshay hassidim ( 2013-08-14 14:50:41 -0500 )edit

At first blush I am not seeing how counters would help. Are you thinking that they would give me a way to sort my observations? An incoming observations already has an observationDate member. This member already imposes an ordering on the observations.

As I see it, the real problem is that an incoming observation may have a date that's earlier than a lot of existing observations. I need to find the previous and next observations for this incoming observation in order to update trending. Furthermore, I'd like to do this without increasing my process' memory footprint.

Please let me know if I've misunderstood the role of counters.

David S gravatar imageDavid S ( 2013-08-14 15:09:11 -0500 )edit

If all you need is ordering counters might not be your solution. Try SQLQuery with order by and see if this good enough for you. http://wiki.gigaspaces.com/wiki/displ...

results = gigapace.readMultiple(
    new SQLQuery<MyClass>(MyClass.class, "num > 1 ORDER BY name"));
shay hassidim gravatar imageshay hassidim ( 2013-08-14 15:36:44 -0500 )edit

I tried that with and without an aggregator and there was no real difference in performance or memory footprint. I am going to go with just the raw sqlquery (no aggregator). Thanks, Shay. -David

David S gravatar imageDavid S ( 2013-08-15 08:20:36 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2013-08-13 11:36:27 -0500

Seen: 310 times

Last updated: Aug 14 '13