# Is there an equivalent to T-SQL's "top n" in Gigaspaces SQLQuery?

Our product deals with large numbers of numeric-valued, dated observations. They are conceptually ordered by date, but they can come into the system out-of-order. Each observation has a "trend" value, which is a measure of how different its numerical value is from the numerical value of the previous observation. Obviously, if a new observation comes in, we need to calculate a trend for this new observation AND for the one that immediately follows it in time.

Consider a new observation Ox. We need an efficient way of calculating these trends. One way to do this would be to use two "top n" queries. One would get the observation Ow just before Ox (so I can calculate the trend for Ox itself). The other would get the observation Oy just after Ox (so I can REcalculate its trend).

Of course, either or both of Ow and Oy may not exist.

We could do this by reading ALL observations, sorting them, and finding Ow and Oy by walking the list -- but there could be hundreds of thousands of observations, I'd much prefer that Gigaspaces found Ow and Oy for me.

edit retag close merge delete

Sort by » oldest newest most voted

how about using polling container to sequence your data having your own logic. you can put new objects if it is accessed frequently. http://wiki.gigaspaces.com/wiki/displ...

more

David,

I believe the approach described below might be relevant: http://wiki.gigaspaces.com/wiki/display/XAP96/Aggregators

The other option would be to maintain some counters to would avoid data query. See: http://wiki.gigaspaces.com/wiki/display/XAP96/Counters

Shay

more

Thank you for your quick replies, Venkat and Shay.

@Venkat: I don't think a polling container buys us anything. We see each observation as it comes in, so we already have a point in the code where we could impose our own logic on some in-memory structure we'd build. This, however would incur a large client-side memory hit. We'd prefer to leave the Observations in the space, and to pull into client space just the ones needed to do the trending calculations. The actual total number of Observations in the space can be in the millions, even though they can be categorized in such a way that any given trending operation pertains only to a subset of them.

@shay: The "Aggregator" approach (your first link) would have worked if there were "CeilingTask" and "FloorTask" aggregators to get (respectively) the earliest Observation whose date is greater than a given date and the latest Observation whose date is less than a given date. But these do not exist.

Thanks, -David

more

actually, I might be able to use the JDBC driver (mentioned in Shay's first link) to do what I want, since JDBC supports the "Limit n" syntax. I'm going to try that.

( 2013-08-13 14:22:48 -0500 )edit

OK. But if the JDBC driver will not provide you what you need or will not be fast enough leveraging the above patterns will be the way to go.

If you must sort data across multiple partitions you will need to use a Distributed Task that will get the "top" objects from each partition and "reduce" at the client each intermediate result into one consolidated result. This is what the JDBC driver essentially doing when called from a remote client. Still , a custom code using projections will make it go way faster as we demonstrate with the Aggregators.

more

Thanks Shay,

We am not sorting data across partitions. The data being sorted is all in a single local partition. I don't think we need a "reduce' step. Is there still an advantage to writing our own Aggregator? Can you explain what that advantage is when dealing with a single local partition?

Also: we are still using Gigaspaces 9.1. Is Aggregator functionality even available in 9.1? It appears that gigaspace.execute() exists in the 9.1 API. Is that really all we need?

-David

more

yes - gigaspace.execute() is all what you need. It will avoid the need to transport data between a client and the space.

What about implementing counters to have fully baked data and avoid data query altogether?

( 2013-08-14 14:50:41 -0500 )edit

At first blush I am not seeing how counters would help. Are you thinking that they would give me a way to sort my observations? An incoming observations already has an observationDate member. This member already imposes an ordering on the observations.

As I see it, the real problem is that an incoming observation may have a date that's earlier than a lot of existing observations. I need to find the previous and next observations for this incoming observation in order to update trending. Furthermore, I'd like to do this without increasing my process' memory footprint.

Please let me know if I've misunderstood the role of counters.

( 2013-08-14 15:09:11 -0500 )edit

If all you need is ordering counters might not be your solution. Try SQLQuery with order by and see if this good enough for you. http://wiki.gigaspaces.com/wiki/displ...

results = gigapace.readMultiple(
new SQLQuery<MyClass>(MyClass.class, "num > 1 ORDER BY name"));

( 2013-08-14 15:36:44 -0500 )edit

I tried that with and without an aggregator and there was no real difference in performance or memory footprint. I am going to go with just the raw sqlquery (no aggregator). Thanks, Shay. -David

( 2013-08-15 08:20:36 -0500 )edit